Algorithmic Solution for Autonomous Vision-Based
O -Road Navigation
Marina Kolesnika , Gerhard Paarb , Arnold Bauerb , Michael Ulmc
a
Institute for Applied Information Technology, GMD-SET, Germany
kolesnik@gmd.de, Fax ++49 2241 / 14 - 2072
b JOANNEUM RESEARCH, Graz, Austria
gerhard.paar@joanneum.ac.at, Fax ++43 316 / 876 - 720
c University Ulm, Germany
ulm@mathematik.uni-ulm.de, Fax ++49 731 / 502 - 3579
ABSTRACT
A vision based navigation system is a basic tool to provide autonomous operations of unmanned vehicles.
For o road navigation that means that the vehicle equipped with a stereo vision system and perhaps
a laser ranging device shall be able to maintain a high level of autonomy under various illumination
conditions and with little a priori information about the underlying scene. The task becomes particularly
important for unmanned planetary exploration with the help of autonomous rovers. For example in the
LEDA Moon exploration project currently under focus by the European Space Agency (ESA), during
the autonomous mode the vehicle (rover) should perform the following operations: on-board absolute
localization, elevation model (DEM) generation, obstacle detection and relative localization, global path
planning and execution.
Focus of this article is a computational solution for fully autonomous path planning and path execution. An operational DEM generation method based on stereoscopy is introduced. Self-localization on
the DEM and robust natural feature tracking are used as basic navigation steps, supported by inertial
sensor systems. The following operations are performed on the basis of stereo image sequences: 3D scene
reconstruction, risk map generation, local path planning, camera position update during the motion on
the basis of landmarks tracking, obstacle avoidance.
Experimental veri cation is done with the help of a laboratory terrain mockup and a high precision
camera mounting device. It is shown that standalone tracking using automatically identi ed landmarks
is robust enough to give navigation data for further stereoscopic reconstruction of the surrounding terrain. Iterative tracking and reconstruction leads to a complete description of the vehicle path and its
surrounding with an accuracy high enough to meet the speci cations for autonomous outdoor navigation.
Keywords: Autonomous Navigation, Stereo Reconstruction, Vehicle, Stereo Matching, Computer Vision.
1. INTRODUCTION
Major advances in autonomous navigation based on computer vision techniques have opened the possibility of using autonomous rovers to explore the planets. The implementation of the recent technological
achievements to planetary rovers is a challenging task outlined in the LEDA Moon exploration project.1
Three roving control modes must be supported by the Vision-Based Navigation (VBN) system according
to LEDA mission objectives:
This work was supported in part by the Austrian Science Foundation (FWF) under grants S7003-MAT and M00265MAT, and JOANNEUM RESEARCH.
1. teleoperating mode of ground based piloting;
2. medium autonomy mode or autonomous piloting and ground based navigation;
3. full autonomy mode or autonomous navigation;
Stereo vision techniques o er a number of advantages in solving the task of autonomous vehicle navigation.
They rely on low cost video technology which uses little power, the sensor is mechanically reliable and
emits no signature (unlike laser range- nding). A stereo system also allows more exibility. Most of the
work during stereo range image production is performed by software which can easily be adapted to a
variety of situations. The same applies to the path planning software block: the logic of path planning
procedure can easily be changed depending upon a goal to be approached on a planetary surface.
There have been several attempts done by di erent teams in the world to integrate stereo vision into
the rover navigation system. The construction of the rst wheeled rover named Robby which successfully
used stereo was completed in December 1989 at JPL.2 Another early stereo vehicle was developed by
Nissan and named PVS.3 With faster computers and advanced algorithms new perspectives in stereo
vision have been opened. Robust stereo system have been realised on the robotic truck NAVLAB4 developed by Carnegie Mellon University. More time demanding trinocular stereo vision has been employed
on the 8-legged walker Dante5 also developed at Carnegie Mellon. The idea to use three stereo cameras
was justi ed in the work by Dhond and Aggarwal6 where they found that the increase in computation
due to a third camera was only 25 % while the decrease in false matches was greater than 50 %.
French teams have been working on autonomous navigation systems for several years. Their approach
proved for the Marsokhod rover7 combined the chain of sequential operations such as stereo reconstruction, obstacle location and path planning. Nevertheless both Dante and the French system imposed a
parallel geometry condition by the stereo reconstruction algorithms used. This condition weakens the
whole system, because parallel stereo geometry is practically dicult to arrange and maintain. Even
small distortions in epipolar stereo image geometry cause immediate impact on the reliability of stereo
matching. Therefore, an additional recti cation procedure must be employed before.8
This paper presents a concept for fully autonomous mode of the rover equipped by an active /
passive imagery sensors setup. Our approach combines both stereo- and mono-vision techniques. The
3D stereo reconstruction algorithm is based on almost arbitrary stereo geometry what makes it stable
against accidental distortions on the vision system arrangement. Tracking techniques are employed during
the path execution step to control the vehicle locomotion along the local path. It can be stated that
optimized versions of the algorithms involved into a closed-loop processing chain are suitable for onboard
implementation. The reliability of the proposed algorithmic solution is demonstrated during simulation
sessions with the help of an accurate robot and a Lunar terrain mockup.
2. SENSOR SYSTEMS
2.1. Stereo Vision System Arrangement and Calibration
Experiences of several teams working on autonomous navigation have proved that parallel geometry of
stereo cameras is dicult to arrange and maintain. A special platform has been designed to adjust and
maintain parallel geometry for three stereo cameras on "Dante", the autonomous vehicle for Antarctic
applications.5 In case of the Russian Marsokhod rover9 both stereo cameras were xed on top of a
special device performing accurate three angles rotation.
Evidently, the parallel stereo geometry can be easily distorted by vibrations during motion or day/night
temperature variations. Therefore a 3D stereo reconstruction approach based on almost arbitrary stereo
geometry looks more preferable and reliable. Consequently, the necessary calibration procedure for the
stereo system is composed from three steps:
1. Intrinsic parameters (focal length , principal point, lens distortion) calibration. It is performed
on-ground during system compilation and considered unchanged (or recalculated depending upon
the known temperature conditions) during the vehicle operations.
2. Initial calibration of camera positions and orientations with respect to the vehicle frame. One important parameter is the distance between the stereo cameras which can be considered unchanged.
This sensor description is performed during system integration and can be used for 3D reconstruction of the environment and as prediction during navigation. However, a very accurate relative
orientation between the cameras can be accomplished by utilizing results from stereo matching.10
3. Online extrinsic (cameras position and orientation) calibration. These parameters must be updated
with respect to a given coordinate system while the vehicle is moving. Auxiliary sensors (Section 2.2)
can help to avoid drifts and provide predictions for attitude parameters.
2.2. Auxiliary Sensors
A set of auxiliary sensors (measurement units) must be integrated in the vehicle navigation system to
provide additional independent information regarding vehicle position and orientation. These are:
Magnetometer to know the general orientation of the vehicle and to make it follow in the desired
direction.
Wheel odometers, accelerometers as additional control units. The vehicle operations are stopped
immediately if the data obtained from the sensors are above prede ned safe thresholds.
Star tracker for independent self localization of the vehicle on a planetary surface other than the
Earth.
Gyroscope / inclinometer, the only source of rotation data, used both for ortho DEM generation
based on stereo matching and for the image sequence tracking procedure. The tracking itself does
not presume any relative rotation between subsequent image frames. In case relative rotation
between two frames cannot be omitted, one can use sensor rotation data to derotate correspondent
image frames.
Laser range- nding (Ladar). This optional sensor can be e ectively used for quick monitoring and
obstacle detection in the close vicinity of the vehicle.
3. CLOSED LOOP SOLUTION FOR AUTONOMOUS NAVIGATION
3.1. Conditions
The vehicle Vision Based Navigation (VBN) system must operate under a set of speci c conditions and
requirements which are present on a planetary surface where immediate human intervention is in fact
impossible. These are:
1. Automatic initial calibration of the vision sensors.
2. Absence of accurate reference points or landmarks for precise self calibration.
3. Low angle illumination conditions.
4. Low angle viewing conditions.
5. No a priori information about underlying terrain.
On the other hand there are two conditions which simplify the operation of the vehicle VBN system:
1. Low speed of the vehicle operations. The vehicle can move in a stop/thinking mode, thus performing
time consuming stereo reconstruction while it is stopped.
2. Static environmental conditions. The vehicles VBN system operates in a still scene as far as no
moving objects within its eld of view are expected.
3.2. Navigational Steps
To accomplish a fully autonomous vehicle navigation mode we suggest to use a combination of stereo- and
mono-vision techniques. Time consuming stereo matching processing is employed during stop-thinking
vehicle mode to calculate a DEM, to classify the underlying scene, and to generate a safe local path.
A less time consuming tracking procedure is employed during the path execution mode to control the
vehicle displacements along the path. Summarizing, we propose the following approach to accomplish
the vehicles autonomous mode:
1.
This step is performed only once to initialize the operational units of the
vehicle. These are: Initial calibration of the imagery sensors (laser, stereo cameras) and measuring
units (wheel odometer / accelerometer / inclinometer sensors, gyroscope, star tracker), and selflocalization of the vehicle position either with respect to a given global map of the surface or with
respect to another vehicle.
2. Operational phase. This step consists of the following operations implemented in the cycle:
Stop-thinking mode: 1. Stereo image acquisition;
2. DEM reconstruction;
3. Risk map generation;
4. Local path generation;
Path execution mode: 5. Consecutive image acquisition;
6. Landmarks tracking;
7. Update vehicle position (calibration update);
Initialization phase.
4. 3D ENVIRONMENT RECONSTRUCTION AND PATH PLANNING
4.1. Reconstruction Principle
One of the widest used methods for surface reconstruction in photogrammetry is stereoscopy. One scene
point is projected into di erent locations in stereo images. When the sensor geometry is known, these
locations can be used to obtain the position of the point in the 3D space. The reconstruction of entire
surfaces is performed by utilizing this principle for many or all pixels of the stereo images provided.
Machine vision for the stereoscopic reconstruction of surfaces can be split roughly into the following
major steps :
1. Image orientation (Calibration) to describe the geometry of the sensing devices.
2. Image matching to get corresponding points in two or more images of the same scene.
3. 3D reconstruction from the correspondences to calculate points on the surface to be reconstructed.
In the following we use the ideal pinhole camera model.11
4.2. Stereo Correspondence
To solve the local correspondence problem, a large variety of core methods for matching have been
published for many applications. None claims to have an overall solution, but in speci c areas the
performance of the given methods is promising enough to prove feasibility. The probably best known
approach uses local correlation coecients to describe local similarities.12 To overcome the numerical
complexity for a single match, some of the older implementations used Laplacians of Gaussians on selected
points.13 The idea to use local primitives evolved to a large variety of di erent local attributes like edges,
corners or local phase.14 Syntactic methods like labeling 15 or contour detection 16 still decrease the
numerical e ort, but introduce combinatoric complexity. Most of the newer approaches rely on hybrid
algorithms or active vision.17
Area based matching methods are usually better suited to well textured scenes with good visual
similarity. They are sensitive to occlusions and noise. Feature-based methods can better handle scenes
with larger depth variations but need in most cases accurate geometric predictions (e.g. the epipolar
constraint) to overcome the ambiguities, especially in the case of repetitive patterns.
In general, the main problems regarding common stereovision systems are as follows:
Similarity on the 2D images is not an objective criterion for correspondence.
Local constraints like smoothness assumption, order constraint, rigidity assumption, or condition of
linear variation18 are correlated to each other. Occlusions destroy some of these constraints, some
remain locally applicable.
The computational e ort is still very high, when applying sophisticated hybrid algorithms for dense
surface descriptions.
The applications for the state-of-art algorithms are very restricted.
One of the main reasons to enhance the idea of using local features for matching instead of area-based
correlation or single-primitive methods was the high density of matching points required for 3D surface
reconstruction. The numerical description of several features and the de nition of a statistical similarity
measure results in the algorithm described in Section 4.3. It is shown in this paper that it can handle
both the correspondences used for stereo reconstruction and for motion detection.
4.3. Hierarchical Feature Vector Matching (HFVM)
4.3.1. Feature Vector Matching (FVM)
Many published matching techniques deal with just one or, at most, two di erent properties of an image.
These include grey levels, edges, corners, and other local primitives. A natural extension of this property
based matching philosophy is a combination of many of these features that would lead to a signi cant
improvement of the stereo matching step, especially in terms of robustness. Such a method combines the
advantages of several image features, whereas the particular disadvantages are compensated by the large
variety of features. This new approach of stereo matching is based upon the idea of creating a feature
vector for each pixel and comparing these features in the images to be registered.
In the following context a feature is a value which numerically describes the neighborhood of a pixel
location. Most of the features used here are described as convolutions or can be approximated by means
of convolutions.19 Calculating a certain feature for all pixels of an image results in a so-called feature
image. In the following a method is presented that matches pixels by comparing a number of features.
Suppose there are m features. All features of one location are collected in the feature vector f~ for this
pixel. From the contents of the feature images this vector can be derived for each pixel of the stereo image
pair. Finding a match is performed by comparing a feature vector of the reference image, the reference
vector, to all feature vectors of the search area which is a part of the search image. The reference image
and the search image are named r and s, respectively, and the `images' consisting of the corresponding
feature vectors ~r and ~s, respectively. Then, for a point p, ~r(p) is the feature vector of p in the reference
image and ~s(p) the feature vector of p in the search image. The lth component of a vector f~ is denoted
by f~l .
Table 1 lists a set of features currently in practical use.
In order to compare a reference vector to a search vector, the feature distance between the two vectors
is computed. The feature distance is de ned such that each component of the vectors is weighted. If the
weight of feature l is denoted as wl , then the feature distance between the vectors f~ and ~g is de ned as
the Euclidian distance:
jf~ , ~gj =
s Pm
l=1 ((f~ml , g~l ) wl )2
l=1 wl 2
P
(1)
f0
f1
f2
f3
f4
f5
f6
Property
Horizontal Highpass
Vertical Highpass
Horizontal Bandpass
Vertical Bandpass
Horizontal Bandpass
Vertical Bandpass
Local Variance20
f7
Gaussian
fi
Table 1.
Kernel
2 1 0 ,1 ,2
(2 1 0 ,1 ,2)T
1 0 ,1 0 1 0 ,1
(1 0 ,1 0 1 0 ,1)T
2 1 0 ,1 ,2 ,1 0 1 2 1 0 ,1 ,2
(2 1 0 ,1 ,2 ,1 0 1 2 1 0 ,1 ,2)T
0
1
2
1
0
1
3
5
3
1
2
5
8
5
2
1
3
5
3
1
0
1
2
1
0
ni
wi
56
4
4
4
2
2
8
8
2
2
3
3
3
3
10
FVM feature set (example). ni are normalization factors, wi are weights.
It could be shown that the absolute di erence can be used instead of Euclidian distance without loss of
accuracy.
Computing, for a point p, the distance between ~r(p) and each vector in the search image is in general
too expensive. In practice, search can be restricted to a certain search space p . This search space is
de ned by the search area, i.e., the center (i; j )p (which is assumed to be given) and the extensions h
and v (which are the same for all points):
= f ~s(q) j q 2 [i , h ; i + h ] [j , v ; j + v ] g:
(2)
For a point p, best correspondence is found at position q in the search space, where the distance
between the reference vector and the search vector is minimal, i.e.,
j~r(p) , f~j:
(3)
j~r(p) , ~s(q)j = min
~
p
f 2p
The principle of Feature Vector Matching is depicted in Figure 1, l. The algorithm is split into the
following parts:
1. Create feature images for both the reference and the search image.
2. Compare each reference vector to all search vectors of the search space. Best correspondence is
found where the feature distance is minimal. The di erence in x and y-coordinate is stored as
disparity vector. If the minimum feature distance exceeds a given threshold, the correspondence is
invalid and the reference pixel is not matchable. As a result, the disparity for the reference pixel
remains unde ned.
3. Remove errors and interpolate unde ned disparities.
4.3.2. Subpixel Matching
Within the search space p of each reference pixel p (Eq. 2) the feature vector distances j~r(p) , ~s(q)j
(Eq. 3, q 2 p ) describe a small image. Around the minimum location qmin the neighboring feature
vector distances are used for a linear interpolation in each direction (row and column).
It turned out that the distribution of subpixel disparities between ,0:5 and 0:5 is not equal. Therefore
a lookup table is de ned which maps subpixel values in the interval (,0:5; 0:5) onto itself to get an equal
distribution. That can be done by one learning step, i.e. applying FVM on some typically textured
images and analyzing the histogram of the subpixel values.
4.3.3. Consistency Check
To measure the consistency of the disparities, matching from right to left is performed as well. This is
called backmatching. On each point l of the left image, the left disparity map is applied. The result is
r. Next, the right disparity map is applied on r resulting in l . The match is de ned invalid when the
distance between l and l exceeds one pixel.
0
0
4.3.4. Hierarchical Feature Vector Matching
In order to evaluate the center of the search area for each point and to improve robustness and eciency
of the matching algorithm, pyramids of the input images are generated.21 Level 0 of the pyramid is the
original image. To create the next level, the average grey-level of four pixels in a square is computed and
stored as one pixel in a new image. Matching starts at the top level of the pyramid with large search
areas for each pixel. The resulting disparity map is smoothed, and unde ned disparities are interpolated,
before it is used as input initial disparity map (de ning the centers of the search areas) for matching the
next lower level of the pyramid.
Search Window
Match left-right
Match right-left
Left
Disparity
Median Filtering
Right Disparity
Median Filtering
Backmatching
Interpolation
Interpolation
Right Feature
Images
Null Prediction
Pyramid Level N
N-1
Right - to - Left Disparities
Reference Vector
Left - to - Right Disparities
feature 3
feature 2
feature 1
feature 3
feature 2
feature 1
Null Prediction
Search Window
Left Feature
Images
Null Disparity
Match left-right
Minimum Distance Vector
Disparity: x=3, y=1
Figure 1.
Prediction
....
Prediction
N-2
Match right-left
l: Feature Vector Matching principle, r: HFVM from Pyramid Level N to N , 1.
Incorporating pyramids, backmatching and lter algorithms leads to Hierarchical Feature Vector
Matching or HFVM.22 The major steps are as follows (Figure 1, r):
1.
2.
3.
4.
5.
6.
7.
Build the pyramid.
Compute the feature images for each pyramid level.
Match the top level of the pyramid.
Filter the resulting disparity map.
Check matching consistency by backmatching.
Interpolate the unde ned disparities.
Use the resulting disparity map as initial disparity map to match the next lower pyramid level.
Steps 4 through 7 are repeated till a disparity map of Level 0 is computed.
4.4. Stereo Reconstruction from Disparities
Traditional recovery of scene topography for stereo vision using spatial forward intersection is not satisfying for many applications. Whenever corresponding image points are directly projected into Cartesian
object space using triangulation, the resulting elevation description is sparse and nonuniform.
The fundamental concept of the Locus reconstruction approach23 is to work in image space rather
than object space. It requires to have knowledge of dense correspondence between left and right image
space. For this reason it is well suited to fully exploit dense and uniform disparity maps computed with
HFVM.
The elevation at an arbitrary reference position is found by intersecting a hypothetical line at this
location with the object's surface (Figure 2). This is done by projecting this hypothetical line into the
Hypothetical Straight Line
P"
Profile Curve
I
Object Space
P'
2
1
Left Camera
Right Camera
P"
P"
r
P"
l
Il
P'
l
Ir
Disparities
P'
r
3
Figure 2. Stereo Locus Method: Relation between object and image space projecting a hypothetical
straight line in the case of left-to-right disparity mapping.
stereo images (left and right Locus). The image information at the Locus location characterizes a pro le
curve on the object's surface. The corresponding location in the other image is found by mapping the
Locus using given dense disparities. The elevation is determined by the most consistent intersection
between the Locus and the pro le curve projection taking into account the well known height on the
curves.24
For easy utilization of the reconstruction results it is important to project the measured surface to any
analytically describable shape. The Locus reconstruction method is able to handle arbitrary global shapes
of cavities: In object space, straight lines perpendicular to a reference surface are de ned. By preference
for a dense reconstruction, they are arranged in a regular grid thereon. Generally, the interesting locations
can be selected without restriction on a reference surface in object space. This allows to individually
exploit speci c regions with di erent reference resolution (2D on the reference shape) but entire height
resolution (perpendicular to the shape).
Utilizing simple neighborhood relationships and the usage of pyramid structure to gradually re ne
the reference surface resolution permits a drastical reduction of the search space for consistent elevations.
The adaptation to the multiresolution disparity map structure coming from HFVM makes it possible to
early provide a rough elevation description for a general view. A consistency check allows to evaluate the
quality and accuracy of the elevation, the disparity map and the calibration. The Locus approach shows
less sensitivity to noise in the disparities than the spatial forward intersection method.
The result is an elevation map projected on the reference surface. Byproduct is an ortho image, the
grey level projection of the images acquired on the reference surface.
4.5. Description of the Environment
The DEMs and ortho images are reconstructed in a world coordinate system considered xed at the
motion initialization point of the vehicle. Therefore elevation models from di erent vehicle positions can
be merged to get a full 3D description of the surrounding of the vehicle path. Figure 7 shows an example
for a merging result of 9 subsequent stereo reconstructions.
4.6. Risk Map Generation
A Risk map is generated on the basis of the Locus DEM (Section 4.4). The basic idea is to select
steep slopes and elevated areas from the DEM and to mark them as hazardous and unsuitable for the
vehicle motion by comparing to some prede ned safe threshold. The algorithm evaluates the minimal
and maximal heights on the DEM in a local window and then calculates the local inclination. It takes
O(N ) calculations per pixel in the optimized version of the algorithm (where N is the local window size).
4.7. Local Path Planning
A path based on a Voronoy diagram 25 is in equal distance from obstacles but is computationally expensive. Our idea is to consider the points of the Risk map located outside hazardous areas as the nodes
of a directed graph.26 The path is generated from a prede ned start point and should reach a speci ed
target point on the local map in such a way to minimize the length of a subgraph (Dijkstra algorithm,27
Figure 3).
feasible target points
obstacle,
prohibit area
Y
Y+1
start point
Path planning task: directed graph on the image eld. Each node in the image Row Y can
be connected only with the three nodes from the previous Row Y+1.
Figure 3.
The start point of the path, e.g. the current position of the vehicle, is considered as the graph origin.
The length of the graph edges are positive values (weights) de ned as the height di erence between the
subsequent vertices. As usual, the length of a particular path which joins any two given vertices of the
graph is de ned as the sum of the edge weights composing the path. Since the real local path must follow
continuously through the image eld, the graph edges must to be connected in a directed way: each
node on the graph can be connected only with the three nodes from the previous image row. Under this
restriction, the number of operations for searching the shortest path from the start point to the possible
destination points is equal to O(N ), where N is the image size.
5. PATH EXECUTION AND NAVIGATION ON THE DEM
5.1. Initial Localization of the Vehicle
The accuracy of initial localization for the vehicle in a given world coordinate system depends upon the
accuracy of the available auxiliary sensor systems and/or a global map. After the vehicle deployment
a local map with respect to the vehicle (its position will specify the beginning of the local coordinate
system) can be used on-board for further vehicle operations. The vehicle position and orientation shall
be constantly controlled on-board in the local coordinate system. That can be done with the help of the
vision system, as described in Section 5.2.2.
5.2. Landmark Tracking for Position Update
5.2.1. Landmark Selection and 2D Tracking
A landmark tracking algorithm28 is used to follow upon the tracks of homologue points (Interest Points)
in two subsequent images. Corresponding displacements between the Interest Points are then used as
data base for calibration update. To extract the Interest Points from the origin image a derivative of
the Moravec operator29 is used. Using auxiliary sensors (Section 2.2) it is only possible to make coarse
predictions of the motion between two consecutive images (frames) taken while the vehicle is moving.
Therefore, we propose to use a hierarchical approach to identify correspondent Interest Points on the
subsequent frames. As a rst iteration, the disparities of all points on the image are calculated in coarse
resolution. In the following, this information is used to calculate the disparities only for the Interest
points in high resolution using Feature Vector Matching.
5.2.2. Calibration Update
The essence of the calibration update method (i.e. identifying an instant camera position and pointing
parameters along the vehicle path) is the following. Let us consider two consecutive images (Frames N
and N+1) taken from the vehicle during motion. The 3D coordinates of the Interest Points on Frame N
are calculated from the DEM and the calibration matrix of Frame N . Having 3D coordinates of Interest
Points acquired from Frame N (x ; y ; z ) and their 2D coordinates (u ; v ) on Frame N+1 (from landmark
tracking between Frame N and Frame N+1), a calibration can be performed to obtain position and
pointing parameters of Frame N+1. The calibration method keeps the intrinsic camera parameters xed
and gains only an update for extrinsic camera parameters.
Assume the lens distortion has already been corrected to gain perfect perspective geometry. Having a ,
a - the focal length divided by pixel sizes in column and row directions and u0 , and v0 - the coordinates
of the principal point - as intrinsic parameters (assuming an orthogonal pixel coordinate system), we use
the perspective projection matrix for the pinhole camera model30 :
0 a r +u r a r +u r a r +u r a t +u t 1
11
0 31
12
0 32
13
0 33
1
0 3
(4)
T := @ a r2 1 + v0 r3 1 a r2 2 + v0 r3 2 a r2 3 + v0 r3 3 a t2 + v0 t3 A ;
i
i
i
i
i
u
v
u
;
;
u
;
;
u
;
;
u
v
;
;
v
;
;
v
;
;
v
r3;1
r3;2
r3;3
t3
where (t1 ; t2 ; t3 ) is the translation vector (camera position), and R = (r ) is the rotation matrix
0
cos cos
R = @ sin ! sin cos , cos ! sin
i;j
cos sin
, sin
sin ! sin sin + cos ! sin sin ! cos
cos ! sin cos + sin ! sin cos ! sin sin , sin ! cos cos ! cos
1
A
(5)
with the three pointing angles !, , and .
The minimization of
X
( + ) :
i
(6)
i
i
gains the position and pointing parameters to be calculated for Frame N+1 with
t x + t1 2 y + t1 3 z + t1 4
= u , 11
t3 1 x + t3 2 y + t3 3 z + t3 4
i
and
;
i
;
i
;
i
;
;
i
;
i
;
i
;
i
i = vi ,
t2;1 xi + t2;2 yi + t2;3 zi + t2;4
t3;1 xi + t3;2 yi + t3;3 zi + t3;4
(7)
(8)
being the backprojection error for each individual landmark on Frame N+1.
The quality of the calibration is checked on each individual interest point by projecting its 3D coordinates back into the image using the calculated calibration. The deviation of the corresponding 2D
coordinates to the original image coordinates of the respective interest point is again calculated using
Equations (7) and (8), and is a measure of consistency. The calibration is iterated with the most inconsistent interest points removed, until this error gets below a certain threshold for all interest points.
6. SOFTWARE ORGANIZATION
The Vehicle's on-board software should be designed as a combination of separate algorithmic solutions
(software blocks) which minimises intermediate data exchange. It is especially important to separate
those software blocks that will need data from the vehicle sensors as input. All necessary processing
shall be sorted into on-board and remote parts after on-board computational resources are clari ed. The
software blocks which need data from the vehicle sensors as input are preferable to be put on-board.
The computational complexity of the algorithms to be developed is a good starting point in a trade-o
regarding necessary on-board computational power.
The current software organization is depicted on Figure 4, . Table 2 gives raw estimates for the
computational performance of the currently used modules on standard hardware.
l
Process
Parameters
Processing Time
Stereo Matching HFVM
Image size 570 520, every pixel
40 sec
Locus Reconstruction (DEM/Ortho Image)
600 300 pixels
20 sec
Risk Map Generation
1 sec
Path Planning
1 Path
1 sec
Tracking
200 Landmarks
5 sec
Calibration
10 Iterations
5 sec
Table 2. Processing time for the simulation software modules (Pentium 166 MHz, Windows NT).
7. SIMULATION RESULTS AND ILLUSTRATIONS
This section describes data and processing results collected during the sessions simulating vehicle operations on the Moon surface. The hardware used for simulation (CamRobII, Figure 4) includes an
accurate robot holding two cameras that can be moved with 7 degrees of freedom within a 2m x 1m
x 1.5m wide volume. The motion is performed by 7 step engines. CamRobII is controlled by software
running on a SPARC workstation. A software interface enables an operator to move the camera on
interactive-command basis to capture images and to store video data together with the camera position
and orientation data. In addition, CamRobII can be accessed via a programmable interface. A model of
Software components currently used for simulations.
with a mockup simulating the lunar surface
Figure 4. l:
r:
CamRobII camera motion device
the Lunar terrain was placed in a 1.6 m x 2 m bed mockup. The whole setup is designed to simulate a
scale factor of 1:10 to reality.
For the vehicle motion simulation session both cameras have been placed in the lowest position and
directed forward and slightly downwards. The positions of the cameras are detected with respect to the
world coordinate system. The correspondence between CamRobII and world coordinate systems is given
by a transformation matrix. Initial coordinates for the very rst position of the left stereo camera and
the stereo basis for the stereo pairs in the sequence are taken as known. The relative orientation of the
stereo con guration based on stereo correspondences was performed using a xed baseline between the
cameras to obtain a certain scale factor. The following sequence of the operations has been performed
with the help of the system described above:
1.
2.
3.
4.
Stereo sequence image acquisition.
Stereo matching and DEM generation.
Risk map generation and local path planning.
Landmarks tracking and calibration update.
7.1. Input Image Acquisition
A long sequence of stereo images (40 pairs) was taken. The left image frames with odd image indices
are taken by the left camera whereas even frames are regarded to the right camera (Figure 5). Both
stereo cameras are set close to the mockup surface and directed slightly downward (15-20 degrees) to
obtain a convergent low viewing angle perspective stereo pair of the mockup terrain. After the rst pair
is acquired both cameras are moved one step forward (15-20 mm) to catch the next stereo pair and so
on. The whole camera path is straight for the case presented on the following illustrations.
S im u la tio n s e s s io n s c h e m e
Y
DEM
1 7
1 5
1 2
1 1
7
5
1 6
1 4
1 3
9
1 8
1 0
DEM
8
6
4
3
3 0 m m
DEM
2
1
1 0 0 m m
X
1 5 m m
- C a m e r a p o s itio n a n d fr a m e n u m b e r
1
Figure 5.
Every fth stereo pair is used for the reconstruction of a new DEM.
7.2. DEM Generation
A general elevation model of the mockup terrain (ortho DEM) is generated on the basis of each fth
stereo pair (called as basic stereo pairs) from the sequence. Intermediate left image frames (e.g. taken
by the left camera) between the subsequent basic stereo pairs are used for the tracking and calibration
update for the left camera. The frequency of the basic stereo pairs is de ned by the necessary overlap
between the reconstructed ortho DEMs to keep 3D coordinates known on the underlying surface (about
70 %). The ortho DEMs calculated from the subsequent basic stereo pairs are merged to generate the
entire ortho DEM for the underlying terrain. The stereo pairs are matched automatically (Figure 6), the
Locus method is applied for the 3D reconstruction. DEM resolution in x and y is selected 1 mm.
Figure 7, , depicts the merging result of the nine ortho images calculated from the nine subsequent
basic stereo pairs. Occluded and unde ned areas are marked here as white.
l
7.3. Path Planning
Path planning was done independently from the tracking simulation to demonstrate the robustness of
the proposed approach for the Moon like terrain. A local path has been generated on the basis of the
once reconstructed DEM. The generated local path is shown on Figure 7, . Hazardous areas unsuitable
for the vehicle motion are marked on the DEM as black. The start and destination points for the vehicle
path are speci ed by an operator. A safe path is generated automatically within safe areas on the basis
of the DEM slopes.
r
7.4. Tracking and Calibration Update
The goal of the landmarks (Interest Points) tracking is to calculate actual displacements between the
CamRobII positions (calibration update) on the basis of tracking information for each subsequent image
frame. The calibration update results have been compared with the actual CamRobII coordinates. The
image sequence used for the tracking and calibration update is composed with the 10 basic stereo pairs
(20 frames) and 4 intermediate left image frames between each of them (40 frames). A relative calibration
Four subsequent stereo pairs (top) and HFVM Matching result of rst stereo pair. l: Row
disparities, r: Column disparities, grey-level coded. Occluded areas around the image center rows caused
matching errors.
Figure 6.
procedure based on stereo matching10 is used to calculate the coordinates of the right camera for each
basic stereo pair. This simulates the process of self-calibration for the case of slight mechanic changes of
the stereo system.
A calibration update procedure28 based on the Interest Points (landmarks) tracking is used to maintain
the coordinates of the 4 intermediate frames known. Each forth intermediate frame composes the left
image for the next basic stereo pair starting therefore the next calibration loop.
The actual CamRobII trajectory used for the stereo sequence acquisition is a straight line. An
o set between subsequent odd and even frames is 15 mm in X direction, and 30 mm in Y direction
(Figure 5). The stereo basis (SB) for the basic stereo pairs is equal to 97 mm. An example of the
tracking paths between corresponding Interest Points 4 subsequent frames is shown on Figure 8. The
major parameters during Interest Points tracking are the number of landmarks and their back projection
error as a measure of the calibration consistency. The most inconsistent Interest Points are rejected and
are not used for calibration. Experiments showed that the optimum number of landmarks necessary for
the reliable calibration is at about 200 points, however, a smaller set of landmarks (<200) still leads to
robust tracking.
Figure 9 displays the trajectory calculated on the basis of tracking and calibration update for the
cameras. The position o sets are presented in Figure 9, r. They show that the discrepancy between
CamRobII coordinates and tracking positions have not considerably accumulated along the path. The
fact that Y o set values are always above 30 mm is explained by the uncertainty in the scale factor chosen
with the estimated stereo baseline.
l: Ortho image merged from nine stereo con gurations. r: Local path put on the DEM.
Elevations are grey coded, bright areas are high. Unknown and hazardous areas are marked black.
Figure 7.
Figure 8.
Four consecutive image frames and landmark tracking paths.
Position
Figure 9. l: Camera trajectory (40 frames) as calculated on the basis of landmarks tracking. The
positions of the second stereo camera are included (every 5th frame). r, from top: x, y and z components
of displacement vectors between successive positions. All values in mm.
8. CONCLUSIONS
A closed-loop vision-based algorithmic solution for autonomous outdoor vehicle navigation has been
described. It integrates the following algorithmic components into the subsequent calculating chain:
1.
2.
3.
4.
5.
6.
Consecutive image stereo data acquisition.
3D Digital Elevation Model reconstruction of the terrain in the anterior vicinity of the vehicle.
Risk map generation.
Local path planning.
Landmarks tacking.
Update vehicle position (calibration update).
The proposed VBN system bene ts two new features which have rarely been used to date by other
autonomous vehicles :
Arbitrary stereo geometry arrangement
Calibration update of vehicle positions based on image sequence tracking
The algorithmic steps have been simulated with the help of an accurate robot (CamRobII, seven degrees
of freedom) placed above a Lunar terrain mockup. Emphasis was put on the critical onboard calculations
in providing fully autonomous and robust vehicle operations.
The following statements can be drawn on the basis of simulation sessions:
3D autonomous vehicle navigation on a Moon-like terrain is feasible.
The accuracy and robustness of each algorithmic component of the VBN system was shown on a
realistic consecutive image sequence containing more than 40 frames.
A long sequence of stable calibration update results without outside intervention demonstrated high
performance of the algorithms involved. The results are very important for the accurate local path
execution by the vehicle.
Assessments for the necessary computation e orts showed that the existing algorithms can be used
for on-board implementation. Remote software can duplicate on-board calculations at a higher level
of accuracy to create a virtual reality environment of the surrounding surface.
Re ex obstacle detection is the remaining and not integrated aspect in the presented simulation which
needs futher justi cation. Another aspect we did not deal with in our simulation is the rotation of the
consecutive image frames taken along the path. Evidently, relative rotation between subsequent image
frames is always present in real vehicle motion.
One additional experimental result is that the tracking procedure remains stable up to 15 degrees.
However, the rotation on the real image sequence can be well above this value even if the images use to
be taken frequently. One approach to solve the problem is to use a gyroscope for independent data about
instant vehicle rotation. Having relative rotational angles available one can appropriately derotate each
image frame before invoking the landmark tracking procedure.
The authors hope on an opportunity to prove the preliminary results of the simulations during real
vehicle experiments.
REFERENCES
1. European Space Agency, \LEDA Assessment Report: LEDA-RP-95-02," tech. rep., ESTEC, Noordwijk, The Netherlands, June 1995.
2. Lavery, D. and Bedard, R.J., \1991 NASA Planetary Rover Program," in Proc. 42nd Congress of
the IAF, IAF, (Montreal), Oct 5-11 1991.
3. Ozaki, T., Ohzora, M., and Kurahashi, K., \An Image Processing System for Autonomous Vehicle,"
in SPIE Vol. 1195 Mobile Robots IV, SPIE, 1989.
4. Thorpe, C.E. (ed.), Vision and Autonomous Navigation. The Carnegie Mellon NAVLAB, Kluwer
Academic Publishers, 1990.
5. Ross,B., \A Practical Stereo Vision System," in 1993 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, I. C. Society, ed., pp. 148{153, IEEE Computer Society Press,
(New York), June 15-18 1993.
6. Dhond, U.R. and Aggarwal, J.K., \A Cost Bene t Analysis of a Third Camera for Stereo Correspondence," International Journal of Computer Vision 6(1), pp. 39{58, 1991.
7. Proy, C., Lamboley, M., Sitenko, I., and Nguen, T.N., \Improving Autonomy of Marsokhod 96," in
Proc. 44th Congress of the IAF, IAF, (Graz, Austria), Oct 16-22 1993. IAF-93-U.6.584.
8. Boissier L., Horz B., Proy C., Faugeras O., and Fua P., \Autonomous Planetary Rover: On-Board
Perception System Concept and Stereo Vision by Correlation," in Proc. IEEE International Conference on Robotics and Automation, 1992.
9. Kolesnik, M. et al, \Possibilities of Stereo Vision for Solving Scienti c and Technological Tasks,"
in Proc. 2nd International Symposium on Missions, Technologies and Design of Planetary Rovers,
CNES,RSA, (Moscow, St.Petersburg), May 15-21 1994.
10. Ulm, M. and Paar. G., \Relative Camera Calibration from Stereo Disparities," in Proc. 3rd Conference on Optical 3-D Measurement Techniques, ISPRS, (Vienna, Austria), October 2-4 1995.
11. O. Faugeras, Three-Dimensional Computer Vision: A Geometrical Viewpoint, MIT Press, 1993.
12. Lane, R.A., Thacker, N.A, and Seed, N.L., \Stretch-Correlation as a Real-Time Alternative to
Feature-Based Stereo Matching Algorithms," Image and Vision Computing 12, pp. 203{212, May
1994.
13. Grimson, W. E. L., \Computional Experiments with a Feature-Based Stereo Algorithm," IEEE
Trans. Pattern Anal. Machine Intell. 7, pp. 17{34, January 1985.
14. Jenkin, M.R.M., \Techniques for Disparity Measurement," Computer Vision, Graphics, and Image
Processing 53, pp. 14{30, January 1991.
15. Medioni, G. and Nevatia, R., \Matching Images Using Linear Features," IEEE Trans. Patt. Anal.
Mach. Intell. 6, pp. 675{685, November 1995.
16. Ho , W. and Ahuja, N., \Surfaces from Stereo: Integrating Feature Matching, Disparity Estimation,
and Contour Detection," IEEE Trans. Patt. Anal. Mach. Intell. 11, pp. 121{136, February 1989.
17. Ahuja, N. and Abbott, A.L., \Active Stereo: Integrating Disparity, Vergence, Focus, Aperture,
and Calibration for Surface Estimation," IEEE Trans. Patt. Anal. Mach. Intell. 15, pp. 1007{1029,
October 1993.
18. Grimson, W.E.L., From Images to Surfaces, MIT Press, Massachusetts, 1981.
19. Paar, G. and Polzleitner, W., \Stereovision and 3d Terrain Modeling for Planetary Exploration,"
in Proc. 1st ESA Workshop on Comp. Vision and Image Processing for Spaceborne Applications,
European Space Research and Technology Centre, (Noordwijk, The Netherlands), June 1991.
20. Hannah, M.J, \A System for Digital Stereo Image Matching," Photogrammetric Engineering &
Remote Sensing 55, pp. 1765{1770, December 1989.
21. Burt, P. and Adelson, E.H., \The Laplacian Pyramid as a Compact Image Code," IEEE Transactions
on Communications 31, pp. 532{540, April 1983.
22. Paar,G. and Polzleitner,W., \Robust Disparity Estimation in Terrain Modeling for Spacecraft Navigation," in Proc. 11th ICPR, International Association for Pattern Recognition, 1992.
23. Kweon, I.S. and Kanade, T., \High-Resolution Terrain Map from Multiple Sensor Data," IEEE
Trans. Patt. Anal. Mach. Intell. 14, pp. 278{292, February 1992.
24. Bauer, A. and Paar, G., \Stereo Reconstruction From Dense Disparity Maps Using the Locus
Method," in Proc. 2nd Conference on Optical 3-D Measurement Techniques, Gruen, A. and Kahmen,
H., eds., pp. 460{466, ETH Zurich, Wichmann Verlag, (Zurich, Switzerland), October 4-7 1993.
25. Barraquand, J. and Latombe, J.C., \Robot Motion Planning: A Distributed Representation Approach," IJRC 10, December 1991.
26. Kolesnik, M., \Vision and Navigation of Marsokhod Rover," in Proc. ACCV'95, pp. III{772 { III{
777, Dec. 5-8 1995.
27. Dijkstra, E.W., \A Note on Two Problems in Connection with Graphs," Numerical Mathematics 1,
pp. 269{271, October 1959.
28. Paar. G., Sidla, O., and Polzleitner, W., \Natural Feature Tracking for Autonomous Navigation,"
in Proc. 28th International Dedicated Conference on Robotics, Motion and Machine Vision, ISATA,
(Stuttgart, Germany), October 1995.
29. Moravec, H.P., \Towards Automatic Visual Obstacle Avoidance," in Proc. 5th International Conference on Arti cial Intelligence MIT, (Cambridge, MA), 1977.
30. Polzleitner, W. and Ulm, M., \Robust dynamic 3d motion estimation using landmarks," in Optical
Tools for Manufacturing and Advanced Automation, Videometrics II, 1993.