&lt;title&gt;Algorithmic solution for autonomous vision-based off-road navigation&lt;/title&gt;

Marina Kolesnik

<title>Algorithmic solution for autonomous vision-based off-road navigation</title>

1998, Proceedings of SPIE

Algorithmic Solution for Autonomous Vision-Based O -Road Navigation Marina Kolesnika , Gerhard Paarb , Arnold Bauerb , Michael Ulmc a Institute for Applied Information Technology, GMD-SET, Germany kolesnik@gmd.de, Fax ++49 2241 / 14 - 2072 b JOANNEUM RESEARCH, Graz, Austria gerhard.paar@joanneum.ac.at, Fax ++43 316 / 876 - 720 c University Ulm, Germany ulm@mathematik.uni-ulm.de, Fax ++49 731 / 502 - 3579 ABSTRACT A vision based navigation system is a basic tool to provide autonomous operations of unmanned vehicles. For o road navigation that means that the vehicle equipped with a stereo vision system and perhaps a laser ranging device shall be able to maintain a high level of autonomy under various illumination conditions and with little a priori information about the underlying scene. The task becomes particularly important for unmanned planetary exploration with the help of autonomous rovers. For example in the LEDA Moon exploration project currently under focus by the European Space Agency (ESA), during the autonomous mode the vehicle (rover) should perform the following operations: on-board absolute localization, elevation model (DEM) generation, obstacle detection and relative localization, global path planning and execution. Focus of this article is a computational solution for fully autonomous path planning and path execution. An operational DEM generation method based on stereoscopy is introduced. Self-localization on the DEM and robust natural feature tracking are used as basic navigation steps, supported by inertial sensor systems. The following operations are performed on the basis of stereo image sequences: 3D scene reconstruction, risk map generation, local path planning, camera position update during the motion on the basis of landmarks tracking, obstacle avoidance. Experimental veri cation is done with the help of a laboratory terrain mockup and a high precision camera mounting device. It is shown that standalone tracking using automatically identi ed landmarks is robust enough to give navigation data for further stereoscopic reconstruction of the surrounding terrain. Iterative tracking and reconstruction leads to a complete description of the vehicle path and its surrounding with an accuracy high enough to meet the speci cations for autonomous outdoor navigation. Keywords: Autonomous Navigation, Stereo Reconstruction, Vehicle, Stereo Matching, Computer Vision. 1. INTRODUCTION Major advances in autonomous navigation based on computer vision techniques have opened the possibility of using autonomous rovers to explore the planets. The implementation of the recent technological achievements to planetary rovers is a challenging task outlined in the LEDA Moon exploration project.1 Three roving control modes must be supported by the Vision-Based Navigation (VBN) system according to LEDA mission objectives: This work was supported in part by the Austrian Science Foundation (FWF) under grants S7003-MAT and M00265MAT, and JOANNEUM RESEARCH. 1. teleoperating mode of ground based piloting; 2. medium autonomy mode or autonomous piloting and ground based navigation; 3. full autonomy mode or autonomous navigation; Stereo vision techniques o er a number of advantages in solving the task of autonomous vehicle navigation. They rely on low cost video technology which uses little power, the sensor is mechanically reliable and emits no signature (unlike laser range- nding). A stereo system also allows more exibility. Most of the work during stereo range image production is performed by software which can easily be adapted to a variety of situations. The same applies to the path planning software block: the logic of path planning procedure can easily be changed depending upon a goal to be approached on a planetary surface. There have been several attempts done by di erent teams in the world to integrate stereo vision into the rover navigation system. The construction of the rst wheeled rover named Robby which successfully used stereo was completed in December 1989 at JPL.2 Another early stereo vehicle was developed by Nissan and named PVS.3 With faster computers and advanced algorithms new perspectives in stereo vision have been opened. Robust stereo system have been realised on the robotic truck NAVLAB4 developed by Carnegie Mellon University. More time demanding trinocular stereo vision has been employed on the 8-legged walker Dante5 also developed at Carnegie Mellon. The idea to use three stereo cameras was justi ed in the work by Dhond and Aggarwal6 where they found that the increase in computation due to a third camera was only 25 % while the decrease in false matches was greater than 50 %. French teams have been working on autonomous navigation systems for several years. Their approach proved for the Marsokhod rover7 combined the chain of sequential operations such as stereo reconstruction, obstacle location and path planning. Nevertheless both Dante and the French system imposed a parallel geometry condition by the stereo reconstruction algorithms used. This condition weakens the whole system, because parallel stereo geometry is practically dicult to arrange and maintain. Even small distortions in epipolar stereo image geometry cause immediate impact on the reliability of stereo matching. Therefore, an additional recti cation procedure must be employed before.8 This paper presents a concept for fully autonomous mode of the rover equipped by an active / passive imagery sensors setup. Our approach combines both stereo- and mono-vision techniques. The 3D stereo reconstruction algorithm is based on almost arbitrary stereo geometry what makes it stable against accidental distortions on the vision system arrangement. Tracking techniques are employed during the path execution step to control the vehicle locomotion along the local path. It can be stated that optimized versions of the algorithms involved into a closed-loop processing chain are suitable for onboard implementation. The reliability of the proposed algorithmic solution is demonstrated during simulation sessions with the help of an accurate robot and a Lunar terrain mockup. 2. SENSOR SYSTEMS 2.1. Stereo Vision System Arrangement and Calibration Experiences of several teams working on autonomous navigation have proved that parallel geometry of stereo cameras is dicult to arrange and maintain. A special platform has been designed to adjust and maintain parallel geometry for three stereo cameras on "Dante", the autonomous vehicle for Antarctic applications.5 In case of the Russian Marsokhod rover9 both stereo cameras were xed on top of a special device performing accurate three angles rotation. Evidently, the parallel stereo geometry can be easily distorted by vibrations during motion or day/night temperature variations. Therefore a 3D stereo reconstruction approach based on almost arbitrary stereo geometry looks more preferable and reliable. Consequently, the necessary calibration procedure for the stereo system is composed from three steps: 1. Intrinsic parameters (focal length , principal point, lens distortion) calibration. It is performed on-ground during system compilation and considered unchanged (or recalculated depending upon the known temperature conditions) during the vehicle operations. 2. Initial calibration of camera positions and orientations with respect to the vehicle frame. One important parameter is the distance between the stereo cameras which can be considered unchanged. This sensor description is performed during system integration and can be used for 3D reconstruction of the environment and as prediction during navigation. However, a very accurate relative orientation between the cameras can be accomplished by utilizing results from stereo matching.10 3. Online extrinsic (cameras position and orientation) calibration. These parameters must be updated with respect to a given coordinate system while the vehicle is moving. Auxiliary sensors (Section 2.2) can help to avoid drifts and provide predictions for attitude parameters. 2.2. Auxiliary Sensors A set of auxiliary sensors (measurement units) must be integrated in the vehicle navigation system to provide additional independent information regarding vehicle position and orientation. These are: Magnetometer to know the general orientation of the vehicle and to make it follow in the desired direction. Wheel odometers, accelerometers as additional control units. The vehicle operations are stopped immediately if the data obtained from the sensors are above prede ned safe thresholds. Star tracker for independent self localization of the vehicle on a planetary surface other than the Earth. Gyroscope / inclinometer, the only source of rotation data, used both for ortho DEM generation based on stereo matching and for the image sequence tracking procedure. The tracking itself does not presume any relative rotation between subsequent image frames. In case relative rotation between two frames cannot be omitted, one can use sensor rotation data to derotate correspondent image frames. Laser range- nding (Ladar). This optional sensor can be e ectively used for quick monitoring and obstacle detection in the close vicinity of the vehicle. 3. CLOSED LOOP SOLUTION FOR AUTONOMOUS NAVIGATION 3.1. Conditions The vehicle Vision Based Navigation (VBN) system must operate under a set of speci c conditions and requirements which are present on a planetary surface where immediate human intervention is in fact impossible. These are: 1. Automatic initial calibration of the vision sensors. 2. Absence of accurate reference points or landmarks for precise self calibration. 3. Low angle illumination conditions. 4. Low angle viewing conditions. 5. No a priori information about underlying terrain. On the other hand there are two conditions which simplify the operation of the vehicle VBN system: 1. Low speed of the vehicle operations. The vehicle can move in a stop/thinking mode, thus performing time consuming stereo reconstruction while it is stopped. 2. Static environmental conditions. The vehicles VBN system operates in a still scene as far as no moving objects within its eld of view are expected. 3.2. Navigational Steps To accomplish a fully autonomous vehicle navigation mode we suggest to use a combination of stereo- and mono-vision techniques. Time consuming stereo matching processing is employed during stop-thinking vehicle mode to calculate a DEM, to classify the underlying scene, and to generate a safe local path. A less time consuming tracking procedure is employed during the path execution mode to control the vehicle displacements along the path. Summarizing, we propose the following approach to accomplish the vehicles autonomous mode: 1. This step is performed only once to initialize the operational units of the vehicle. These are: Initial calibration of the imagery sensors (laser, stereo cameras) and measuring units (wheel odometer / accelerometer / inclinometer sensors, gyroscope, star tracker), and selflocalization of the vehicle position either with respect to a given global map of the surface or with respect to another vehicle. 2. Operational phase. This step consists of the following operations implemented in the cycle: Stop-thinking mode: 1. Stereo image acquisition; 2. DEM reconstruction; 3. Risk map generation; 4. Local path generation; Path execution mode: 5. Consecutive image acquisition; 6. Landmarks tracking; 7. Update vehicle position (calibration update); Initialization phase. 4. 3D ENVIRONMENT RECONSTRUCTION AND PATH PLANNING 4.1. Reconstruction Principle One of the widest used methods for surface reconstruction in photogrammetry is stereoscopy. One scene point is projected into di erent locations in stereo images. When the sensor geometry is known, these locations can be used to obtain the position of the point in the 3D space. The reconstruction of entire surfaces is performed by utilizing this principle for many or all pixels of the stereo images provided. Machine vision for the stereoscopic reconstruction of surfaces can be split roughly into the following major steps : 1. Image orientation (Calibration) to describe the geometry of the sensing devices. 2. Image matching to get corresponding points in two or more images of the same scene. 3. 3D reconstruction from the correspondences to calculate points on the surface to be reconstructed. In the following we use the ideal pinhole camera model.11 4.2. Stereo Correspondence To solve the local correspondence problem, a large variety of core methods for matching have been published for many applications. None claims to have an overall solution, but in speci c areas the performance of the given methods is promising enough to prove feasibility. The probably best known approach uses local correlation coecients to describe local similarities.12 To overcome the numerical complexity for a single match, some of the older implementations used Laplacians of Gaussians on selected points.13 The idea to use local primitives evolved to a large variety of di erent local attributes like edges, corners or local phase.14 Syntactic methods like labeling 15 or contour detection 16 still decrease the numerical e ort, but introduce combinatoric complexity. Most of the newer approaches rely on hybrid algorithms or active vision.17 Area based matching methods are usually better suited to well textured scenes with good visual similarity. They are sensitive to occlusions and noise. Feature-based methods can better handle scenes with larger depth variations but need in most cases accurate geometric predictions (e.g. the epipolar constraint) to overcome the ambiguities, especially in the case of repetitive patterns. In general, the main problems regarding common stereovision systems are as follows: Similarity on the 2D images is not an objective criterion for correspondence. Local constraints like smoothness assumption, order constraint, rigidity assumption, or condition of linear variation18 are correlated to each other. Occlusions destroy some of these constraints, some remain locally applicable. The computational e ort is still very high, when applying sophisticated hybrid algorithms for dense surface descriptions. The applications for the state-of-art algorithms are very restricted. One of the main reasons to enhance the idea of using local features for matching instead of area-based correlation or single-primitive methods was the high density of matching points required for 3D surface reconstruction. The numerical description of several features and the de nition of a statistical similarity measure results in the algorithm described in Section 4.3. It is shown in this paper that it can handle both the correspondences used for stereo reconstruction and for motion detection. 4.3. Hierarchical Feature Vector Matching (HFVM) 4.3.1. Feature Vector Matching (FVM) Many published matching techniques deal with just one or, at most, two di erent properties of an image. These include grey levels, edges, corners, and other local primitives. A natural extension of this property based matching philosophy is a combination of many of these features that would lead to a signi cant improvement of the stereo matching step, especially in terms of robustness. Such a method combines the advantages of several image features, whereas the particular disadvantages are compensated by the large variety of features. This new approach of stereo matching is based upon the idea of creating a feature vector for each pixel and comparing these features in the images to be registered. In the following context a feature is a value which numerically describes the neighborhood of a pixel location. Most of the features used here are described as convolutions or can be approximated by means of convolutions.19 Calculating a certain feature for all pixels of an image results in a so-called feature image. In the following a method is presented that matches pixels by comparing a number of features. Suppose there are m features. All features of one location are collected in the feature vector f~ for this pixel. From the contents of the feature images this vector can be derived for each pixel of the stereo image pair. Finding a match is performed by comparing a feature vector of the reference image, the reference vector, to all feature vectors of the search area which is a part of the search image. The reference image and the search image are named r and s, respectively, and the `images' consisting of the corresponding feature vectors ~r and ~s, respectively. Then, for a point p, ~r(p) is the feature vector of p in the reference image and ~s(p) the feature vector of p in the search image. The lth component of a vector f~ is denoted by f~l . Table 1 lists a set of features currently in practical use. In order to compare a reference vector to a search vector, the feature distance between the two vectors is computed. The feature distance is de ned such that each component of the vectors is weighted. If the weight of feature l is denoted as wl , then the feature distance between the vectors f~ and ~g is de ned as the Euclidian distance: jf~ , ~gj = s Pm l=1 ((f~ml , g~l ) wl )2 l=1 wl 2 P (1) f0 f1 f2 f3 f4 f5 f6 Property Horizontal Highpass Vertical Highpass Horizontal Bandpass Vertical Bandpass Horizontal Bandpass Vertical Bandpass Local Variance20 f7 Gaussian fi Table 1. Kernel 2 1 0 ,1 ,2 (2 1 0 ,1 ,2)T 1 0 ,1 0 1 0 ,1 (1 0 ,1 0 1 0 ,1)T 2 1 0 ,1 ,2 ,1 0 1 2 1 0 ,1 ,2 (2 1 0 ,1 ,2 ,1 0 1 2 1 0 ,1 ,2)T 0 1 2 1 0 1 3 5 3 1 2 5 8 5 2 1 3 5 3 1 0 1 2 1 0 ni wi 56 4 4 4 2 2 8 8 2 2 3 3 3 3 10 FVM feature set (example). ni are normalization factors, wi are weights. It could be shown that the absolute di erence can be used instead of Euclidian distance without loss of accuracy. Computing, for a point p, the distance between ~r(p) and each vector in the search image is in general too expensive. In practice, search can be restricted to a certain search space p . This search space is de ned by the search area, i.e., the center (i; j )p (which is assumed to be given) and the extensions h and v (which are the same for all points): = f ~s(q) j q 2 [i , h ; i + h ] [j , v ; j + v ] g: (2) For a point p, best correspondence is found at position q in the search space, where the distance between the reference vector and the search vector is minimal, i.e., j~r(p) , f~j: (3) j~r(p) , ~s(q)j = min ~ p f 2p The principle of Feature Vector Matching is depicted in Figure 1, l. The algorithm is split into the following parts: 1. Create feature images for both the reference and the search image. 2. Compare each reference vector to all search vectors of the search space. Best correspondence is found where the feature distance is minimal. The di erence in x and y-coordinate is stored as disparity vector. If the minimum feature distance exceeds a given threshold, the correspondence is invalid and the reference pixel is not matchable. As a result, the disparity for the reference pixel remains unde ned. 3. Remove errors and interpolate unde ned disparities. 4.3.2. Subpixel Matching Within the search space p of each reference pixel p (Eq. 2) the feature vector distances j~r(p) , ~s(q)j (Eq. 3, q 2 p ) describe a small image. Around the minimum location qmin the neighboring feature vector distances are used for a linear interpolation in each direction (row and column). It turned out that the distribution of subpixel disparities between ,0:5 and 0:5 is not equal. Therefore a lookup table is de ned which maps subpixel values in the interval (,0:5; 0:5) onto itself to get an equal distribution. That can be done by one learning step, i.e. applying FVM on some typically textured images and analyzing the histogram of the subpixel values. 4.3.3. Consistency Check To measure the consistency of the disparities, matching from right to left is performed as well. This is called backmatching. On each point l of the left image, the left disparity map is applied. The result is r. Next, the right disparity map is applied on r resulting in l . The match is de ned invalid when the distance between l and l exceeds one pixel. 0 0 4.3.4. Hierarchical Feature Vector Matching In order to evaluate the center of the search area for each point and to improve robustness and eciency of the matching algorithm, pyramids of the input images are generated.21 Level 0 of the pyramid is the original image. To create the next level, the average grey-level of four pixels in a square is computed and stored as one pixel in a new image. Matching starts at the top level of the pyramid with large search areas for each pixel. The resulting disparity map is smoothed, and unde ned disparities are interpolated, before it is used as input initial disparity map (de ning the centers of the search areas) for matching the next lower level of the pyramid. Search Window Match left-right Match right-left Left Disparity Median Filtering Right Disparity Median Filtering Backmatching Interpolation Interpolation Right Feature Images Null Prediction Pyramid Level N N-1 Right - to - Left Disparities Reference Vector Left - to - Right Disparities feature 3 feature 2 feature 1 feature 3 feature 2 feature 1 Null Prediction Search Window Left Feature Images Null Disparity Match left-right Minimum Distance Vector Disparity: x=3, y=1 Figure 1. Prediction .... Prediction N-2 Match right-left l: Feature Vector Matching principle, r: HFVM from Pyramid Level N to N , 1. Incorporating pyramids, backmatching and lter algorithms leads to Hierarchical Feature Vector Matching or HFVM.22 The major steps are as follows (Figure 1, r): 1. 2. 3. 4. 5. 6. 7. Build the pyramid. Compute the feature images for each pyramid level. Match the top level of the pyramid. Filter the resulting disparity map. Check matching consistency by backmatching. Interpolate the unde ned disparities. Use the resulting disparity map as initial disparity map to match the next lower pyramid level. Steps 4 through 7 are repeated till a disparity map of Level 0 is computed. 4.4. Stereo Reconstruction from Disparities Traditional recovery of scene topography for stereo vision using spatial forward intersection is not satisfying for many applications. Whenever corresponding image points are directly projected into Cartesian object space using triangulation, the resulting elevation description is sparse and nonuniform. The fundamental concept of the Locus reconstruction approach23 is to work in image space rather than object space. It requires to have knowledge of dense correspondence between left and right image space. For this reason it is well suited to fully exploit dense and uniform disparity maps computed with HFVM. The elevation at an arbitrary reference position is found by intersecting a hypothetical line at this location with the object's surface (Figure 2). This is done by projecting this hypothetical line into the Hypothetical Straight Line P" Profile Curve I Object Space P' 2 1 Left Camera Right Camera P" P" r P" l Il P' l Ir Disparities P' r 3 Figure 2. Stereo Locus Method: Relation between object and image space projecting a hypothetical straight line in the case of left-to-right disparity mapping. stereo images (left and right Locus). The image information at the Locus location characterizes a pro le curve on the object's surface. The corresponding location in the other image is found by mapping the Locus using given dense disparities. The elevation is determined by the most consistent intersection between the Locus and the pro le curve projection taking into account the well known height on the curves.24 For easy utilization of the reconstruction results it is important to project the measured surface to any analytically describable shape. The Locus reconstruction method is able to handle arbitrary global shapes of cavities: In object space, straight lines perpendicular to a reference surface are de ned. By preference for a dense reconstruction, they are arranged in a regular grid thereon. Generally, the interesting locations can be selected without restriction on a reference surface in object space. This allows to individually exploit speci c regions with di erent reference resolution (2D on the reference shape) but entire height resolution (perpendicular to the shape). Utilizing simple neighborhood relationships and the usage of pyramid structure to gradually re ne the reference surface resolution permits a drastical reduction of the search space for consistent elevations. The adaptation to the multiresolution disparity map structure coming from HFVM makes it possible to early provide a rough elevation description for a general view. A consistency check allows to evaluate the quality and accuracy of the elevation, the disparity map and the calibration. The Locus approach shows less sensitivity to noise in the disparities than the spatial forward intersection method. The result is an elevation map projected on the reference surface. Byproduct is an ortho image, the grey level projection of the images acquired on the reference surface. 4.5. Description of the Environment The DEMs and ortho images are reconstructed in a world coordinate system considered xed at the motion initialization point of the vehicle. Therefore elevation models from di erent vehicle positions can be merged to get a full 3D description of the surrounding of the vehicle path. Figure 7 shows an example for a merging result of 9 subsequent stereo reconstructions. 4.6. Risk Map Generation A Risk map is generated on the basis of the Locus DEM (Section 4.4). The basic idea is to select steep slopes and elevated areas from the DEM and to mark them as hazardous and unsuitable for the vehicle motion by comparing to some prede ned safe threshold. The algorithm evaluates the minimal and maximal heights on the DEM in a local window and then calculates the local inclination. It takes O(N ) calculations per pixel in the optimized version of the algorithm (where N is the local window size). 4.7. Local Path Planning A path based on a Voronoy diagram 25 is in equal distance from obstacles but is computationally expensive. Our idea is to consider the points of the Risk map located outside hazardous areas as the nodes of a directed graph.26 The path is generated from a prede ned start point and should reach a speci ed target point on the local map in such a way to minimize the length of a subgraph (Dijkstra algorithm,27 Figure 3). feasible target points obstacle, prohibit area Y Y+1 start point Path planning task: directed graph on the image eld. Each node in the image Row Y can be connected only with the three nodes from the previous Row Y+1. Figure 3. The start point of the path, e.g. the current position of the vehicle, is considered as the graph origin. The length of the graph edges are positive values (weights) de ned as the height di erence between the subsequent vertices. As usual, the length of a particular path which joins any two given vertices of the graph is de ned as the sum of the edge weights composing the path. Since the real local path must follow continuously through the image eld, the graph edges must to be connected in a directed way: each node on the graph can be connected only with the three nodes from the previous image row. Under this restriction, the number of operations for searching the shortest path from the start point to the possible destination points is equal to O(N ), where N is the image size. 5. PATH EXECUTION AND NAVIGATION ON THE DEM 5.1. Initial Localization of the Vehicle The accuracy of initial localization for the vehicle in a given world coordinate system depends upon the accuracy of the available auxiliary sensor systems and/or a global map. After the vehicle deployment a local map with respect to the vehicle (its position will specify the beginning of the local coordinate system) can be used on-board for further vehicle operations. The vehicle position and orientation shall be constantly controlled on-board in the local coordinate system. That can be done with the help of the vision system, as described in Section 5.2.2. 5.2. Landmark Tracking for Position Update 5.2.1. Landmark Selection and 2D Tracking A landmark tracking algorithm28 is used to follow upon the tracks of homologue points (Interest Points) in two subsequent images. Corresponding displacements between the Interest Points are then used as data base for calibration update. To extract the Interest Points from the origin image a derivative of the Moravec operator29 is used. Using auxiliary sensors (Section 2.2) it is only possible to make coarse predictions of the motion between two consecutive images (frames) taken while the vehicle is moving. Therefore, we propose to use a hierarchical approach to identify correspondent Interest Points on the subsequent frames. As a rst iteration, the disparities of all points on the image are calculated in coarse resolution. In the following, this information is used to calculate the disparities only for the Interest points in high resolution using Feature Vector Matching. 5.2.2. Calibration Update The essence of the calibration update method (i.e. identifying an instant camera position and pointing parameters along the vehicle path) is the following. Let us consider two consecutive images (Frames N and N+1) taken from the vehicle during motion. The 3D coordinates of the Interest Points on Frame N are calculated from the DEM and the calibration matrix of Frame N . Having 3D coordinates of Interest Points acquired from Frame N (x ; y ; z ) and their 2D coordinates (u ; v ) on Frame N+1 (from landmark tracking between Frame N and Frame N+1), a calibration can be performed to obtain position and pointing parameters of Frame N+1. The calibration method keeps the intrinsic camera parameters xed and gains only an update for extrinsic camera parameters. Assume the lens distortion has already been corrected to gain perfect perspective geometry. Having a , a - the focal length divided by pixel sizes in column and row directions and u0 , and v0 - the coordinates of the principal point - as intrinsic parameters (assuming an orthogonal pixel coordinate system), we use the perspective projection matrix for the pinhole camera model30 : 0 a r +u r a r +u r a r +u r a t +u t 1 11 0 31 12 0 32 13 0 33 1 0 3 (4) T := @ a r2 1 + v0 r3 1 a r2 2 + v0 r3 2 a r2 3 + v0 r3 3 a t2 + v0 t3 A ; i i i i i u v u ; ; u ; ; u ; ; u v ; ; v ; ; v ; ; v r3;1 r3;2 r3;3 t3 where (t1 ; t2 ; t3 ) is the translation vector (camera position), and R = (r ) is the rotation matrix 0 cos cos R = @ sin ! sin cos , cos ! sin i;j cos sin , sin sin ! sin sin + cos ! sin sin ! cos cos ! sin cos + sin ! sin cos ! sin sin , sin ! cos cos ! cos 1 A (5) with the three pointing angles !, , and . The minimization of X ( + ) : i (6) i i gains the position and pointing parameters to be calculated for Frame N+1 with t x + t1 2 y + t1 3 z + t1 4 = u , 11 t3 1 x + t3 2 y + t3 3 z + t3 4 i and ; i ; i ; i ; ; i ; i ; i ; i i = vi , t2;1 xi + t2;2 yi + t2;3 zi + t2;4 t3;1 xi + t3;2 yi + t3;3 zi + t3;4 (7) (8) being the backprojection error for each individual landmark on Frame N+1. The quality of the calibration is checked on each individual interest point by projecting its 3D coordinates back into the image using the calculated calibration. The deviation of the corresponding 2D coordinates to the original image coordinates of the respective interest point is again calculated using Equations (7) and (8), and is a measure of consistency. The calibration is iterated with the most inconsistent interest points removed, until this error gets below a certain threshold for all interest points. 6. SOFTWARE ORGANIZATION The Vehicle's on-board software should be designed as a combination of separate algorithmic solutions (software blocks) which minimises intermediate data exchange. It is especially important to separate those software blocks that will need data from the vehicle sensors as input. All necessary processing shall be sorted into on-board and remote parts after on-board computational resources are clari ed. The software blocks which need data from the vehicle sensors as input are preferable to be put on-board. The computational complexity of the algorithms to be developed is a good starting point in a trade-o regarding necessary on-board computational power. The current software organization is depicted on Figure 4, . Table 2 gives raw estimates for the computational performance of the currently used modules on standard hardware. l Process Parameters Processing Time Stereo Matching HFVM Image size 570 520, every pixel 40 sec Locus Reconstruction (DEM/Ortho Image) 600 300 pixels 20 sec Risk Map Generation 1 sec Path Planning 1 Path 1 sec Tracking 200 Landmarks 5 sec Calibration 10 Iterations 5 sec Table 2. Processing time for the simulation software modules (Pentium 166 MHz, Windows NT). 7. SIMULATION RESULTS AND ILLUSTRATIONS This section describes data and processing results collected during the sessions simulating vehicle operations on the Moon surface. The hardware used for simulation (CamRobII, Figure 4) includes an accurate robot holding two cameras that can be moved with 7 degrees of freedom within a 2m x 1m x 1.5m wide volume. The motion is performed by 7 step engines. CamRobII is controlled by software running on a SPARC workstation. A software interface enables an operator to move the camera on interactive-command basis to capture images and to store video data together with the camera position and orientation data. In addition, CamRobII can be accessed via a programmable interface. A model of Software components currently used for simulations. with a mockup simulating the lunar surface Figure 4. l: r: CamRobII camera motion device the Lunar terrain was placed in a 1.6 m x 2 m bed mockup. The whole setup is designed to simulate a scale factor of 1:10 to reality. For the vehicle motion simulation session both cameras have been placed in the lowest position and directed forward and slightly downwards. The positions of the cameras are detected with respect to the world coordinate system. The correspondence between CamRobII and world coordinate systems is given by a transformation matrix. Initial coordinates for the very rst position of the left stereo camera and the stereo basis for the stereo pairs in the sequence are taken as known. The relative orientation of the stereo con guration based on stereo correspondences was performed using a xed baseline between the cameras to obtain a certain scale factor. The following sequence of the operations has been performed with the help of the system described above: 1. 2. 3. 4. Stereo sequence image acquisition. Stereo matching and DEM generation. Risk map generation and local path planning. Landmarks tracking and calibration update. 7.1. Input Image Acquisition A long sequence of stereo images (40 pairs) was taken. The left image frames with odd image indices are taken by the left camera whereas even frames are regarded to the right camera (Figure 5). Both stereo cameras are set close to the mockup surface and directed slightly downward (15-20 degrees) to obtain a convergent low viewing angle perspective stereo pair of the mockup terrain. After the rst pair is acquired both cameras are moved one step forward (15-20 mm) to catch the next stereo pair and so on. The whole camera path is straight for the case presented on the following illustrations. S im u la tio n s e s s io n s c h e m e Y DEM 1 7 1 5 1 2 1 1 7 5 1 6 1 4 1 3 9 1 8 1 0 DEM 8 6 4 3 3 0 m m DEM 2 1 1 0 0 m m X 1 5 m m - C a m e r a p o s itio n a n d fr a m e n u m b e r 1 Figure 5. Every fth stereo pair is used for the reconstruction of a new DEM. 7.2. DEM Generation A general elevation model of the mockup terrain (ortho DEM) is generated on the basis of each fth stereo pair (called as basic stereo pairs) from the sequence. Intermediate left image frames (e.g. taken by the left camera) between the subsequent basic stereo pairs are used for the tracking and calibration update for the left camera. The frequency of the basic stereo pairs is de ned by the necessary overlap between the reconstructed ortho DEMs to keep 3D coordinates known on the underlying surface (about 70 %). The ortho DEMs calculated from the subsequent basic stereo pairs are merged to generate the entire ortho DEM for the underlying terrain. The stereo pairs are matched automatically (Figure 6), the Locus method is applied for the 3D reconstruction. DEM resolution in x and y is selected 1 mm. Figure 7, , depicts the merging result of the nine ortho images calculated from the nine subsequent basic stereo pairs. Occluded and unde ned areas are marked here as white. l 7.3. Path Planning Path planning was done independently from the tracking simulation to demonstrate the robustness of the proposed approach for the Moon like terrain. A local path has been generated on the basis of the once reconstructed DEM. The generated local path is shown on Figure 7, . Hazardous areas unsuitable for the vehicle motion are marked on the DEM as black. The start and destination points for the vehicle path are speci ed by an operator. A safe path is generated automatically within safe areas on the basis of the DEM slopes. r 7.4. Tracking and Calibration Update The goal of the landmarks (Interest Points) tracking is to calculate actual displacements between the CamRobII positions (calibration update) on the basis of tracking information for each subsequent image frame. The calibration update results have been compared with the actual CamRobII coordinates. The image sequence used for the tracking and calibration update is composed with the 10 basic stereo pairs (20 frames) and 4 intermediate left image frames between each of them (40 frames). A relative calibration Four subsequent stereo pairs (top) and HFVM Matching result of rst stereo pair. l: Row disparities, r: Column disparities, grey-level coded. Occluded areas around the image center rows caused matching errors. Figure 6. procedure based on stereo matching10 is used to calculate the coordinates of the right camera for each basic stereo pair. This simulates the process of self-calibration for the case of slight mechanic changes of the stereo system. A calibration update procedure28 based on the Interest Points (landmarks) tracking is used to maintain the coordinates of the 4 intermediate frames known. Each forth intermediate frame composes the left image for the next basic stereo pair starting therefore the next calibration loop. The actual CamRobII trajectory used for the stereo sequence acquisition is a straight line. An o set between subsequent odd and even frames is 15 mm in X direction, and 30 mm in Y direction (Figure 5). The stereo basis (SB) for the basic stereo pairs is equal to 97 mm. An example of the tracking paths between corresponding Interest Points 4 subsequent frames is shown on Figure 8. The major parameters during Interest Points tracking are the number of landmarks and their back projection error as a measure of the calibration consistency. The most inconsistent Interest Points are rejected and are not used for calibration. Experiments showed that the optimum number of landmarks necessary for the reliable calibration is at about 200 points, however, a smaller set of landmarks (<200) still leads to robust tracking. Figure 9 displays the trajectory calculated on the basis of tracking and calibration update for the cameras. The position o sets are presented in Figure 9, r. They show that the discrepancy between CamRobII coordinates and tracking positions have not considerably accumulated along the path. The fact that Y o set values are always above 30 mm is explained by the uncertainty in the scale factor chosen with the estimated stereo baseline. l: Ortho image merged from nine stereo con gurations. r: Local path put on the DEM. Elevations are grey coded, bright areas are high. Unknown and hazardous areas are marked black. Figure 7. Figure 8. Four consecutive image frames and landmark tracking paths. Position Figure 9. l: Camera trajectory (40 frames) as calculated on the basis of landmarks tracking. The positions of the second stereo camera are included (every 5th frame). r, from top: x, y and z components of displacement vectors between successive positions. All values in mm. 8. CONCLUSIONS A closed-loop vision-based algorithmic solution for autonomous outdoor vehicle navigation has been described. It integrates the following algorithmic components into the subsequent calculating chain: 1. 2. 3. 4. 5. 6. Consecutive image stereo data acquisition. 3D Digital Elevation Model reconstruction of the terrain in the anterior vicinity of the vehicle. Risk map generation. Local path planning. Landmarks tacking. Update vehicle position (calibration update). The proposed VBN system bene ts two new features which have rarely been used to date by other autonomous vehicles : Arbitrary stereo geometry arrangement Calibration update of vehicle positions based on image sequence tracking The algorithmic steps have been simulated with the help of an accurate robot (CamRobII, seven degrees of freedom) placed above a Lunar terrain mockup. Emphasis was put on the critical onboard calculations in providing fully autonomous and robust vehicle operations. The following statements can be drawn on the basis of simulation sessions: 3D autonomous vehicle navigation on a Moon-like terrain is feasible. The accuracy and robustness of each algorithmic component of the VBN system was shown on a realistic consecutive image sequence containing more than 40 frames. A long sequence of stable calibration update results without outside intervention demonstrated high performance of the algorithms involved. The results are very important for the accurate local path execution by the vehicle. Assessments for the necessary computation e orts showed that the existing algorithms can be used for on-board implementation. Remote software can duplicate on-board calculations at a higher level of accuracy to create a virtual reality environment of the surrounding surface. Re ex obstacle detection is the remaining and not integrated aspect in the presented simulation which needs futher justi cation. Another aspect we did not deal with in our simulation is the rotation of the consecutive image frames taken along the path. Evidently, relative rotation between subsequent image frames is always present in real vehicle motion. One additional experimental result is that the tracking procedure remains stable up to 15 degrees. However, the rotation on the real image sequence can be well above this value even if the images use to be taken frequently. One approach to solve the problem is to use a gyroscope for independent data about instant vehicle rotation. Having relative rotational angles available one can appropriately derotate each image frame before invoking the landmark tracking procedure. The authors hope on an opportunity to prove the preliminary results of the simulations during real vehicle experiments. REFERENCES 1. European Space Agency, \LEDA Assessment Report: LEDA-RP-95-02," tech. rep., ESTEC, Noordwijk, The Netherlands, June 1995. 2. Lavery, D. and Bedard, R.J., \1991 NASA Planetary Rover Program," in Proc. 42nd Congress of the IAF, IAF, (Montreal), Oct 5-11 1991. 3. Ozaki, T., Ohzora, M., and Kurahashi, K., \An Image Processing System for Autonomous Vehicle," in SPIE Vol. 1195 Mobile Robots IV, SPIE, 1989. 4. Thorpe, C.E. (ed.), Vision and Autonomous Navigation. The Carnegie Mellon NAVLAB, Kluwer Academic Publishers, 1990. 5. Ross,B., \A Practical Stereo Vision System," in 1993 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, I. C. Society, ed., pp. 148{153, IEEE Computer Society Press, (New York), June 15-18 1993. 6. Dhond, U.R. and Aggarwal, J.K., \A Cost Bene t Analysis of a Third Camera for Stereo Correspondence," International Journal of Computer Vision 6(1), pp. 39{58, 1991. 7. Proy, C., Lamboley, M., Sitenko, I., and Nguen, T.N., \Improving Autonomy of Marsokhod 96," in Proc. 44th Congress of the IAF, IAF, (Graz, Austria), Oct 16-22 1993. IAF-93-U.6.584. 8. Boissier L., Horz B., Proy C., Faugeras O., and Fua P., \Autonomous Planetary Rover: On-Board Perception System Concept and Stereo Vision by Correlation," in Proc. IEEE International Conference on Robotics and Automation, 1992. 9. Kolesnik, M. et al, \Possibilities of Stereo Vision for Solving Scienti c and Technological Tasks," in Proc. 2nd International Symposium on Missions, Technologies and Design of Planetary Rovers, CNES,RSA, (Moscow, St.Petersburg), May 15-21 1994. 10. Ulm, M. and Paar. G., \Relative Camera Calibration from Stereo Disparities," in Proc. 3rd Conference on Optical 3-D Measurement Techniques, ISPRS, (Vienna, Austria), October 2-4 1995. 11. O. Faugeras, Three-Dimensional Computer Vision: A Geometrical Viewpoint, MIT Press, 1993. 12. Lane, R.A., Thacker, N.A, and Seed, N.L., \Stretch-Correlation as a Real-Time Alternative to Feature-Based Stereo Matching Algorithms," Image and Vision Computing 12, pp. 203{212, May 1994. 13. Grimson, W. E. L., \Computional Experiments with a Feature-Based Stereo Algorithm," IEEE Trans. Pattern Anal. Machine Intell. 7, pp. 17{34, January 1985. 14. Jenkin, M.R.M., \Techniques for Disparity Measurement," Computer Vision, Graphics, and Image Processing 53, pp. 14{30, January 1991. 15. Medioni, G. and Nevatia, R., \Matching Images Using Linear Features," IEEE Trans. Patt. Anal. Mach. Intell. 6, pp. 675{685, November 1995. 16. Ho , W. and Ahuja, N., \Surfaces from Stereo: Integrating Feature Matching, Disparity Estimation, and Contour Detection," IEEE Trans. Patt. Anal. Mach. Intell. 11, pp. 121{136, February 1989. 17. Ahuja, N. and Abbott, A.L., \Active Stereo: Integrating Disparity, Vergence, Focus, Aperture, and Calibration for Surface Estimation," IEEE Trans. Patt. Anal. Mach. Intell. 15, pp. 1007{1029, October 1993. 18. Grimson, W.E.L., From Images to Surfaces, MIT Press, Massachusetts, 1981. 19. Paar, G. and Polzleitner, W., \Stereovision and 3d Terrain Modeling for Planetary Exploration," in Proc. 1st ESA Workshop on Comp. Vision and Image Processing for Spaceborne Applications, European Space Research and Technology Centre, (Noordwijk, The Netherlands), June 1991. 20. Hannah, M.J, \A System for Digital Stereo Image Matching," Photogrammetric Engineering & Remote Sensing 55, pp. 1765{1770, December 1989. 21. Burt, P. and Adelson, E.H., \The Laplacian Pyramid as a Compact Image Code," IEEE Transactions on Communications 31, pp. 532{540, April 1983. 22. Paar,G. and Polzleitner,W., \Robust Disparity Estimation in Terrain Modeling for Spacecraft Navigation," in Proc. 11th ICPR, International Association for Pattern Recognition, 1992. 23. Kweon, I.S. and Kanade, T., \High-Resolution Terrain Map from Multiple Sensor Data," IEEE Trans. Patt. Anal. Mach. Intell. 14, pp. 278{292, February 1992. 24. Bauer, A. and Paar, G., \Stereo Reconstruction From Dense Disparity Maps Using the Locus Method," in Proc. 2nd Conference on Optical 3-D Measurement Techniques, Gruen, A. and Kahmen, H., eds., pp. 460{466, ETH Zurich, Wichmann Verlag, (Zurich, Switzerland), October 4-7 1993. 25. Barraquand, J. and Latombe, J.C., \Robot Motion Planning: A Distributed Representation Approach," IJRC 10, December 1991. 26. Kolesnik, M., \Vision and Navigation of Marsokhod Rover," in Proc. ACCV'95, pp. III{772 { III{ 777, Dec. 5-8 1995. 27. Dijkstra, E.W., \A Note on Two Problems in Connection with Graphs," Numerical Mathematics 1, pp. 269{271, October 1959. 28. Paar. G., Sidla, O., and Polzleitner, W., \Natural Feature Tracking for Autonomous Navigation," in Proc. 28th International Dedicated Conference on Robotics, Motion and Machine Vision, ISATA, (Stuttgart, Germany), October 1995. 29. Moravec, H.P., \Towards Automatic Visual Obstacle Avoidance," in Proc. 5th International Conference on Arti cial Intelligence MIT, (Cambridge, MA), 1977. 30. Polzleitner, W. and Ulm, M., \Robust dynamic 3d motion estimation using landmarks," in Optical Tools for Manufacturing and Advanced Automation, Videometrics II, 1993.

Log In

<title>Algorithmic solution for autonomous vision-based off-road navigation</title>

Related papers

Related papers

Related topics