Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

HBT03

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Robotics and Autonomous Systems 44 (2003) 1527

Learning compact 3D models of indoor and outdoor environments with a mobile robot
Dirk Hhnel a , Wolfram Burgard a,b, , Sebastian Thrun b
a

Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany b School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

Abstract This paper presents an algorithm for full 3D shape reconstruction of indoor and outdoor environments with mobile robots. Data is acquired with laser range nders installed on a mobile robot. Our approach combines efcient scan matching routines for robot pose estimation with an algorithm for approximating environments using at surfaces. On top of that, our approach includes a mesh simplication technique to reduce the complexity of the resulting models. In extensive experiments, our method is shown to produce accurate models of indoor and outdoor environments that compare favorably to other methods. 2003 Elsevier Science B.V. All rights reserved.
Keywords: Map building; 3D mapping; Model simplication

1. Introduction The topic of learning 3D models of buildings (exterior and interior) and man-made objects has received considerable attention over the past few years. 3D models are useful for a range of applications. For example, architects and building managers may use 3D models for design and utility studies using virtual reality (VR) technology. Emergency crews, such as re ghters, could utilize 3D models for planning as to how to best operate at a hazardous site. 3D models are also useful for robots operating in urban environments. And nally, accurate 3D models could be a great supplement to the video game industry, especially if the model complexity is low enough for real-time VR rendering. In all of these application domains, there
Corresponding author. Present address: Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany. Tel.: +49-761-203-8026/8006; fax: +49-761-203-8007. E-mail address: burgard@informatik.uni-freiburg.de (W. Burgard).

is a need for methods that can generate 3D models at low cost, and with minimum human intervention. In the literature, approaches for 3D mapping can be divided into two categories: approaches that assume knowledge of the pose of the sensors [15], and approaches that do not [6,7]. In the present paper, we are interested in using mobile robots for data acquisition; hence our approach falls into the second category due to the inherent noise in robot odometry. However, unlike the approaches in [6,7] which generate highly complex models, our focus is on generating low-complexity models that can be rendered in real-time. The approach in [7], for example, composes models where the number of polygons is similar to the number of raw scans, which easily lies in the hundreds of thousands even for small indoor environments. The majority of existing systems also requires human input in the 3D modeling process. Here we are interested in fully automated modeling without any human interaction. Our approach is also related to [8], which reconstructs planar models of indoor environments

0921-8890/03/$ see front matter 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S0921-8890(03)00007-1

16

D. Hhnel et al. / Robotics and Autonomous Systems 44 (2003) 1527

using stereo vision, using some manual guidance in the reconstruction process to account for the lack of visible structure in typical indoor environments. This paper presents an algorithm for generating simplied 3D models of indoor and outdoor environments. The data for generating these models are acquired by mobile robots equipped with laser range nders. To estimate the pose of the robot while collecting the data, a probabilistic scan matching algorithm is used. The resulting pre-ltered data is globally consistent but locally noisy. A recursive surface identication algorithm is then employed to identify large at surfaces, thereby reducing the complexity of the 3D model signicantly while also eliminating much of the noise in the sensor measurement. The resulting 3D models consist of large, planar surfaces, interspersed with small ne-structured models of regions that cannot be captured by a at surface model. The topic of simplication of polygonal models has long been studied in the computer graphics literature (see e.g., [911]), often with the goal of devising algorithms for real-time rendering of complex models. There are two important characteristics of the data generated by robots that differ from the polygonal model studied in the computer graphics literature. First, robot data is noisy. The models studied in the computer graphics literature are usually assumed to be noise-free; hence, simplications are only applied for increasing the speed of rendering, and not for the reduction of noise. This has important ramications, as the noise in the data renders a close-to-random ne-structure of the initial 3D models. Second, built structure is known to contain large, at surfaces that are typically parallel or orthogonal to the ground. Such a prior is usually not incorporated in polygon simplication algorithms. Consequently, a comparison with the algorithm presented in [9] illustrates that our approach yields more accurate and realistic-looking 3D models.

likelihood of the t th pose and map relative to the (t 1)th pose and map: lt 1 , st 1 )) p(lt |ut 1 , lt 1 )}. lt =argmax{p(st |lt , m(
lt

(1) lt 1 , st 1 )) is the In this equation the term p(st |lt , m( probability of the most recent measurement st given the pose lt and the map m( lt 1 , st 1 ) constructed so lt 1 ) represents the probafar. The term p(lt |ut 1 , bility that the robot is at location lt given the robot previously was at position lt 1 and has carried out (or measured) the motion ut 1 . The resulting pose lt is then used to generate a new map m via the standard incremental map updating function presented in [12]: m( lt , st ) = argmaxp(m| lt , st ).
m

(2)

The overall approach can be summarized as follows: at any point t 1 in time the robot is given an estimate of its pose lt 1 and a map m( lt 1 , st 1 ). After the robot moved further on and after taking a new measurement st , the robot determines the most likely new pose lt . It does this by trading off the consistency of the measurement with the map (rst term on the right-hand side in (1)) and the consistency of the new pose with the control action and the previous pose (second term on the right-hand side in (1)). The map is then extended by the new measurement st , using the pose lt as the pose at which this measurement was taken. It remains to describe how we actually maximize Eq. (1). Our system applies two different approaches depending on whether the scans to be aligned are twoor three-dimensional. 2.1. Two-dimensional scan alignment Our algorithm to align a scan relative to the previous scans represented by the map m( lt 1 , st 1 ) is an extension of the approach presented in [7]. In our system, we additionally integrate over small Gaussian errors in the robot pose when computing the maps. This avoids that many cells remain unknown especially if the scans contain long beams, increases the smoothness of the likelihood function to be optimized, and results in better alignments. To maximize the likelihood of a scan with respect to the given map, we apply a hill climbing strategy. As we gured

2. Computing consistent maps Our current system is able to learn 2D and 3D maps using range scans recorded with a mobile robot. In both cases, the approach is incremental. Mathematically, we calculate a sequence of poses l1 , l2 , . . . and corresponding maps by maximizing the marginal

D. Hhnel et al. / Robotics and Autonomous Systems 44 (2003) 1527

17

Fig. 1. Two-dimensional scan alignment. Map created out of the previous scans (left image), measurement st obtained at time t (center image) and resulting alignment (right image).

out in practice, this approach yields highly accurate position estimates and results in very accurate maps. Fig. 1 shows one example, in which a range scan is aligned with the given map. A typical map resulting obtained with our approach is shown Fig. 2. The size of the corresponding environment is 50 m 14 m. 2.2. Aligning three-dimensional range scans Unfortunately, three-dimensional variants of the maps and likelihood functions described above would consume too much memory. Therefore, this approach is not applicable to 3D scan alignment. Instead, we represent 3D maps as triangle meshes constructed from the individual scans. We create a triangle for triples of neighboring scan points, if the maximum length of an edge does not exceed a certain threshold which depends on the length of the beams. To compute the most likely position of a new 3D scan with respect to the current 3D model, we apply an approximative physical model of the range scanning

process [13]. Obviously, an ideal sensor would always measure the correct distance to the closest obstacle in the sensing direction. However, sensors and models generated out of range scanners are noisy. Therefore, our system incorporates measurement noise and random noise in order two deal with errors typically found in 3D range scans. First, we generally have normally distributed measurement errors around the distance expected according to the current position of the scanner and the given model of the environment. Additionally, we observe randomly distributed measurements because of errors in the model and because of deviations in the angles between corresponding beams in consecutive scans. Therefore, our model consists of a mixture of a Gaussian with a uniform distribution. The mode of the Gaussian corresponds to the distance expected given the current map. Additionally, we use a uniform distribution to deal with maximum range readings. To save computation time, we approximate the resulting distribution by a mixture of triangular distributions (see Fig. 3).

Fig. 2. Two-dimensional map of Sieg Hall at the University of Washington, Seattle, constructed out of 8013 2D range scans.

18

D. Hhnel et al. / Robotics and Autonomous Systems 44 (2003) 1527

Fig. 3. The probabilistic measurement model given as a mixture of a Gaussian and a uniform distribution and its approximation by piecewise linear functions.

Whereas this approach saves memory, it requires more computation time than the technique for 2D-scan alignment. However, in practical experiments we found out that this technique has two major advantages over the iterative closest point (ICP) algorithm [14,15] and other similar scan matching techniques. First, it exploits the fact that each laser beam is a ray that does not go through surfaces and therefore does not require special heuristics for dealing with occlusions. Second, our approach also exploits the information provided by maximum range readings. For example, if such a beam goes through surfaces in the map, it reduces the likelihood of the current alignment. To compute the likelihood of a beam b given the current map m( lt 1 , st 1 ), we rst determine the expected distance e(b, m( lt 1 , st 1 )) to the closest obstacle in the measurement direction. This is efciently carried out using ray-tracing techniques based on a spatial tiling and indexing [16] of the current map. Then we compute the likelihood of the measured distance given

the expected distance, i.e. we determine the quantity p(b|e(b, m( lt 1 , st 1 ))) using the mixture computed t for e(b, m( l 1 , st 1 )). Assuming that the beams contained in st are independent, we compute the likelihood of the whole scan as p(st |lt , m( lt 1 , st 1 )) =
bst

p(b|e(b, m( lt 1 , st 1 ))).

(3)

To maximize Eq. (1) we again apply a hill climbing technique. 2.3. Generating raw 3D data To record the 3D data used throughout this paper we used two different mobile platforms with different kinds of laser congurations (see Fig. 4). In the rst conguration the robot carries two lasers. Whereas the rst laser scans horizontally, the second laser is

Fig. 4. The platforms used for acquiring the 3D data. Indoor system with two lasers (left) and outdoor system with one laser mounted on a pan/tilt unit (right).

D. Hhnel et al. / Robotics and Autonomous Systems 44 (2003) 1527

19

Fig. 5. Model learned for a fraction of the Wean Hall at Carnegie Mellon University (left) and fractions of the raw data for parts of the wall (center) and the ceiling (right).

upward-pointed (see left image in Fig. 4). This robot, which is used for indoor mapping, uses the front laser to map an unknown environment in 2D and to recover the robots pose. At the same time the upward-pointed laser scans the 3D structure of the environment. The robot depicted in the right image of Fig. 4 is used for outdoor environments only. It carries a single laser scanner that is mounted on a pan/tilt unit allowing the robot to dynamically change the scanning direction. With both robots we obtain a polygonal model by connecting consecutive 3D points. To avoid closing wholes coming from doorways etc., we only create a polygonal surface if the consecutive points are close to each other. A typical model resulting from this process for the Wean Hall at Carnegie Mellon University is depicted in Fig. 5. This data set was recorded with the robot depicted in the left image of Fig. 4.

3. Learning smooth 3D models Although the position estimation techniques described in Sections 2.1 and 2.2 produce accurate position estimates, the resulting models often lack the appropriate smoothness. Fig. 5 shows, in detail, a model of a corridor including parts of a doorway (see center image) and the ceiling (see right image). As it can be easily seen, the data is extremely rugged. Whereas some of the ruggedness arises from remaining errors in the pose estimation, the majority of the errors stems from measurement noise in laser range nders. However, the key characteristic here is that all noise is local, as the scans have been globally aligned by the 2D mapping algorithm. As a result,

global structures cannot be extracted by considering small areas of the data. Rather, one has to scan larger fractions of the model in order to nd such structures. As an example, consider the fractions of the doorway and the ceiling depicted in the right two images of Fig. 5. Although the corresponding objects are planar in the real world, this structure cannot be extracted from the local surfaces. Fig. 6 shows the surface normals for 5000 surfaces of the walls and the ceiling partly shown in Fig. 5. As can be seen from the gure, the normals are almost uniformly distributed. Please note that this problem is inherent to the sensor-based acquisition of high-resolution 3D models. In order to scan an object with high-resolution, the distance of consecutive scanning positions must be sufciently small. However, the smaller is the distance between consecutive scanning positions, the higher is the inuence of the measurement noise on the deviation between surface normals of neighboring shapes (and also to the true surface normal). Since approximations of larger structures cannot be found by a local analysis, more exhaustive techniques are required. In our system, we apply a randomized search technique to nd larger planar structures in the data. If such a planar structure is found, our approach maps the corresponding 3D points on this planar surface. In a second phase neighboring surfaces in the mesh, which lie on the same plane and satisfy further constraints described below are merged into larger polygons. 3.1. Planar approximation of surfaces The algorithm to nd planes for sets of points is a randomized approach. It starts with a randomly chosen

20

D. Hhnel et al. / Robotics and Autonomous Systems 44 (2003) 1527

Fig. 6. Normed surface normals for a ceiling (left) and a wall (right).

point in 3D and applies a region growing technique to nd a maximum set of points in the neighborhood to which a tting plane can be found. As the optimal plane we chose that plane which minimizes the sum of the squared distances to the points vi in the current region. The normal of this plane is given by the eigenvector corresponding to the smallest eigenvalue of the 3 3 matrix
n

sumed to be random). This process is described more precisely in Table 1. To nd the best planes, this process is restarted for different randomly chosen starting points v1 and v2 . Our approach always selects the largest plane found in each round. If no further plane can be found, the overall process is terminated. 3.2. Merging of surfaces In a second phase, neighboring polygons belonging to the same plane are merged to larger polygons. A polygon belongs to a plane, if all of its edges belong to this plane. Two polygons of the same plane can be combined, if: (1) both polygons have exactly one sequence of common edges; (2) if both polygons do not overlap. Our approach repeatedly performs this merging process until there do not exist any further polygons that can be merged. Please note that both conditions
Table 1 The plane extraction algorithm Select point tuple v1 , v2 := {v1 , v2 } WHILE (new point can be found) BEGIN Select point v with pointDist(, v ) < if error( {v }) < && ( {v }, v ) < := {v } END WHILE

A=
i=1

(vi m)T (vi m), where m =

1 n

vi ,
i=1

(4) is the center of mass of the points vi . The minimum eigenvalue corresponds to the sum of the squares of the distances between the plane and the points vi . Our approach proceeds as follows. It starts with a random point v1 and its nearest neighbor v2 . It then repeatedly tries to extend the current set of points by considering all other points in increasing distance from this point set. Suppose v is a point such that the point distance pointDist(, v ) between v and one point in is less than (which is 30 cm in our current implementation). If the average squared error error( {v }) to the optimal plane for {v } is less than (which was 2.8 in all our experiments) and if the distance of v to the optimal plane for ( {v }) is less than ( = 10 cm in our implementation) then v is added to . As a result, regions are grown to include nearby points regardless of the surface normal of the polygons neighboring these points (which are as-

D. Hhnel et al. / Robotics and Autonomous Systems 44 (2003) 1527

21

are sufcient to ensure that each merging operation leads to a valid polygon. Furthermore, the resulting polygons are not necessarily convex, i.e. our approach does not close wholes in the model coming from doors or windows, such as the technique described in [6]. Obviously, our approach solves a mesh simplication problem that has been studied thoroughly in the computer graphics literature. The important difference between our approach and mesh simplication algorithms from computer graphics, such as [9,10], lies in the way the input data is processed. In contrast to our method, which tries to t a plane to a larger set of points, the techniques presented in [9,10] only perform a local search and consider only pairs of surfaces. Neighbored surfaces are simplied by minimizing an error or energy function which species the visual discrepancy between the original model and simplied model in terms of discontinuities in the surface normals. Because of the local noise in our data these techniques cannot distinguish between areas with a higher level of detail such as corners and areas with few details such as planar structures corresponding to walls. Thus, the simplication is carried out uniformly over the mesh. Our approach, in contrast, simplies planar structures and leaves a high level of detail where it really matters. 3.3. Improving the efciency One of the major problems with the approach described above is its time complexity. For a typical data set consisting of 200,000 surfaces, a naive implementation on a standard PC requires over 10 h to extract all planes. For environments like the ones considered

here, a major issue therefore is the reduction of the overall search space. To speed-up the plane extraction process, our system extracts lines out of the individual two-dimensional range scans using the split- and merge-technique which has been applied with great success in the past [17]. Given the Hessian normal form of the extracted lines we compute a histograms of the line angles. Thereby each line is weighted proportionally to the number points that belong to it. For the system with two range scanners, we compute a separate histogram for each scanner. Since the scans are obtained asynchronously, we treat both histograms independently and compute the histogram for possible plane directions based on the product of these two distributions. Fig. 8 shows the histograms obtained for a fraction of Wean Hall data set (see Fig. 5). Whereas the left image shows the predominant angles found in the horizontal scans, the right image shows the same for the vertical scanner. To extract the planes, our approach proceeds as follows. For each local maximum found in the histogram for vertical lines and each peak in the histogram for horizontal scans we construct the corresponding plane in 3D. We then sweep this plane along its normal through the data set (see Fig. 7). For each possible plane, we perform two different sweeps with two different discretizations of the plane positions along its normal. The rst sweep is carried out using a discretization of 5 cm. The second sweep is then carried out with a ner resolution of 1 cm and for every peak found during the rst sweep. As an example consider the histograms shown in Fig. 9. The left image of this gure shows a plot of the number of points that are closer than 40 cm to the corresponding plane for different positions the plane

Fig. 7. Sweeping a plane through a point set (left) and computing a virtual vertical 2D-scan from a 3D scan.

22

D. Hhnel et al. / Robotics and Autonomous Systems 44 (2003) 1527

Fig. 8. Histogram of the angles of line segments found in all scans of the horizontal (left image) and vertical (right image) scanner.

along its surface normal. The histogram obtained for the rst plane candidate is illustrated according to the histograms depicted in Fig. 8 is shown in the left image of Fig. 9. The right image of this gure shows the histogram obtained using a high-resolution sweep with a discretization of 1 cm for the rst peak in the coarse-resolution histogram. As can be seen, the ner resolution reveals a second peak. This additional peak comes from the doors in the environment which constitute planes that are slightly displaced from the walls. For each peak in the high-resolution histogram, we collect all data points which are close to the corresponding plane and apply the plane extraction process described above. Because the number of points close to a plane generally is much smaller than the overall number of points we obtain an enormous speed-up of more than one order of magnitude.

Please note that the same technique can be applied to 3D data gathered with the laser mounted on the pantilt unit. However, it requires certain geometric transformations to correctly identify vertical and horizontal lines from these data. For the sake of brevity, we only present the corresponding equations for the extraction of the angles of vertical lines. To compute these angles we transform the data into a set of virtual vertical 2D-scans. Fig. 7 shows the different parameters characterizing a single beam d of a 3D scan. Whereas denotes the angle between d and center beam y of the laser within the currently scanned plane, corresponds to the tilt of the laser scanner. To efciently compute the vertical 2D-scan we sort all beams according to their vertical tilt angle and the angle of their projection onto the horizontal plane and y, which is the projection of y onto the

Fig. 9. Histogram of the points belonging a plane for one sweep through the point set (left image) and histogram obtained by a high-resolution sweep for the rst peak (right image).

D. Hhnel et al. / Robotics and Autonomous Systems 44 (2003) 1527

23

horizontal plane. These two quantities can be computed out of and by the following geometric computations: x = d sin (), y = y cos (), y = d cos (), z = y sin ().

Accordingly, tan () and sin () can be computed as follows: x d sin () tan () = = y y cos () d sin () tan () = = , d cos () cos () cos () z y sin () d cos () sin () sin () = = = d d d = cos () sin (). Thus, we obtain: tan () , cos () = arcsin ( cos () sin ()). = arctan

(5)

The vertical 2D-scans are then extracted by selecting those points for which lies within a small interval around the currently considered angle.

4. Experimental results Our approach has been implemented and tested using different platforms (see Fig. 4) and in indoor and outdoor environments. The robots were equipped with two 2D laser range scanners or one 2D scanner mounted on a pan/tilt unit. Whereas the angular resolution of the laser of the outdoor system is 0.25 ,

the angular resolutions of the laser scanners installed on the indoor-robots varied between 1 and 0.5 . The resolution of the measured distances is 1 cm and the measurement error of scanners lies below 20 cm (SICK PLS) and below 5 cm (SICK LMS). To obtain an appropriate density of 3D points during the indoor experiments, we moved the robots at relatively slow speeds of 10 cm/s. The rst experiment was carried out in the Wean Hall at the Carnegie Mellon University. Here the robot traveled 10 m through a corridor and measured 140,923 points in 3D using a SICK PLS laser scanner. The corresponding raw 3D data shown in Fig. 5 consists of 267,355 triangles. The result of our simplication technique is shown in the left image of Fig. 10. This model consists of 3613 polygons or quads and 34,227 triangles. Accordingly, the reduction ratio is 100:14.2. The right image of Fig. 10 shows the result of the QSlim system [9] which is a state-of-the-art computer graphics algorithms to reduce the complexity of 3D models. Please note that this model contains the same number of polygons as the model obtained with our approach. Obviously, the quality of our model is signicantly higher than the quality obtained by applying the QSlim system. In several tests we found out, that the resulting models obtained with QSLIM lack important details such as doors if the models are reduced to a complexity at which the walls look planar as they do in the models generated by our approach. The second experiment was carried out in a corridor of the Sieg Hall at the University of Washington in Seattle. The robot measured 1,933,018 three-dimensional data points. The resulting model is shown in the left image of Fig. 11. Our approach

Fig. 10. Approximations for the Wean Hall data set: our approach (left) and QSlim result with the same number of polygons (right).

24

D. Hhnel et al. / Robotics and Autonomous Systems 44 (2003) 1527

Fig. 11. 3D data gathered in Sieg Hall, University of Washington (left image) and simplied model (right image).

needed 52 min to simplify the overall data set to 2471 polygons and 242 quads. Only 151,624 triangles could not be approximated by larger planar structures. This corresponds to a reduction ratio of 100:4.1. The resulting model is depicted in the right image of Fig. 11. The third experiment was carried out using our outdoor system Herbert shown in the right image of Fig. 4. The robot scanned parts of buildings on the Campus of the University of Freiburg. A photography of the corresponding area is depicted in the left image of Fig. 12. The size of this area is 40 m 60 m. The right image of this gure shows the simplied model obtained with our algorithm. In this example, the reduction ration was 100:8.9 although the scene contains several non-planar structures like tress. Table 2 summarizes the gures obtained in the experiments reported here. As the last row illustrates, our

Table 2 Statistics for the three data sets obtained in the experiments Wean Hall Number of points Time (min) Plane extraction Polygon merging Raw model Number of triangles Reduced model Number of polygons Number of quads Number of triangles Reduction ratio 140923 6:16 0:20 267355 2626 987 34227 100:14.2 Sieg Hall 1933018 51:42 9:43 3765072 2471 242 151624 100:4.1 Campus 210921 7:14 0:48 377896 255 18 33312 100:8.9

Fig. 12. Photography of parts of two buildings at the University of Freiburg (left) and learned model (right).

D. Hhnel et al. / Robotics and Autonomous Systems 44 (2003) 1527

25

algorithm yields reduction ratios between 100:14.2 and 100:4.1 which corresponds to a serious reduction of the complexity. Visual inspection of the resulting models also illustrated that it very well preserves the structure of the environment.

5. Related work Due to the various application areas like virtual reality, tele-presence, access to cultural savings, the problem of constructing 3D models has recently gained serious interest. The approaches described in [25] rely on computer vision techniques and reconstruct 3D models from sequences of images. Allen and Stamos [1] construct accurate 3D models with stationary range scanners. Their approach also includes techniques for planar approximations in order to simplify the models. However, their technique computes the convex hull of polygons in the same plane and therefore cannot deal with windows or doors. Furthermore, their approach to region clustering assumes that the relative positions between consecutive scans are exactly known. Systems similar to ours have been presented in [6,7]. Both techniques use a mobile platform to construct 3D models of an environment using range sensors. However, they do not include any means for planar approximation. Accordingly our models have a signicantly lower complexity. Recently [18] developed an approach based on the EM-algorithm to determine planar approximations. In contrast to this method, our approach is inherently able to determine the number of planes in the data set. The problem of polygonal simplication has been studied intensively in the computer graphics area [911]. The primary goal of these methods is to simplify a mesh so that the visual appearance of the original model and the simplied model is almost identical. Typical criteria used for simplication are the discontinuity of the surface normals of neighboring surfaces as well as the relative angle between the surface normal and the viewing direction. Because of the local noise in the data, however, these methods fail to extract planar structures. Accordingly, our approach provides signicantly better approximations in such areas. Furthermore, several researchers have worked on the problem of range-image registration in the con-

text of the construction of three-dimensional models and in the eld of reverse engineering. A popular approach to combine several range-images into a single model is the ICP-algorithm [14]. This technique iteratively computes the displacement between two scans by matching the corresponding meshes in a point-wise. Due to the fact that it does not exploit the odometry information and the model of the robots motions, resulting estimates are not as good as with our method. In practice we observed several situations in which the ICP-algorithm diverges, whereas our 2D and 3D scan matching techniques provide a correct result [13]. A further class of approaches addresses the problem of data segmentation and surface approximation out of segmented range data. For example, [19] presented tensor-voting to detect surfaces in range data. The goal is to compute a segmentation of the data and than extract features out of the segmented range data. Furthermore, there are several approaches that consider the problem of shape matching [20]. Finally there are approaches that consider the problem of template matching in range scans. Among them are feature-based techniques [21] and methods using binary decision trees to speed-up the search [22]. Compared to these techniques, our approach can be regarded as a pre-processing step that rst reduces the data to planar and non-planar structures. Thus, these methods need only to be applied to the non-planar parts of the models generated by our method.

6. Conclusions We have presented an algorithm for acquiring 3D models with mobile robots. The algorithm proceeds in two stages: rst, the robot pose is estimated using 2D and 3D scan matching algorithms. Second, 3D data is smoothed by identifying large planar surface regions. The resulting algorithm is capable of producing 3D maps without manual intervention, as demonstrated using data sets of indoor and outdoor scenes. The work raises several follow-up questions that warrant future research. Most importantly, the current 3D model is limited to at surfaces. Measurements not representing at objects are not corrected in any

26

D. Hhnel et al. / Robotics and Autonomous Systems 44 (2003) 1527 International Conference on Computer Vision and Pattern Recognition (CVPR), 1998. S. Hakim, P. Boulanger, F. Blais, A mobile system for indoors 3-d mapping and positioning, in: Proceedings of the Fourth Conference on Optical 3-D Measurement Techniques, 1997. S. Thrun, W. Burgard, D. Fox, A real-time algorithm for mobile robot mapping with applications to multi-robot and 3D mapping, in: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2000. M.B.L. Iocchi, K. Konolige, Visually realistic mapping of a planar environment with stereo, in: Proceedings of the 2000 International Symposium on Experimental Robotics, Waikiki, Hawaii, 2000. M. Garland, P. Heckbert, Surface simplication using quadric error metrics, in: Proceedings of the International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 1997, pp. 209216. H. Hoppe, Progressive meshes, in: Proceedings of the 23rd International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 1996. D. Luebke, C. Erikson, View-dependent simplication of arbitrary polygonal environments, in: Proceedings of the 24th International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 1997. H. Moravec, Sensor fusion in certainty grids for mobile robots, AI Magazine 9 (2) (1988) 6174. D. Hhnel, W. Burgard, Probabilistic matching for 3d scan registration, in: Proceedings of the VDI-Conference Robotik, 2002. P. Besl, N. McKay, A method for registration of 3d shapes, Transactions on Pattern Analysis and Machine Intelligence 14 (2) (1992) 239256. M. Greenspan, G. Godin, A nearest neighbor method for efcient ICP, in: Proceedings of the Third International Conference on 3-D Digital Imaging and Modeling (3DIM01), 2001. H. Samet, Applications of Spatial Data Structures, Addison-Wesley, Reading, MA, 1989. J.-S. Gutmann, Robuste navigation autonomer mobiler systeme, Ph.D. Thesis, University of Freiburg, German, 2000. Y. Liu, R. Emery, D. Chakrabarti, W. Burgard, S. Thrun, Using EM to learn 3D models of indoor environments with mobile robots, in: Proceedings of the International Conference on Machine Learning (ICML), 2001. G. Medioni, M. Lee, C. Tang, A Computational Framework for Segmentation and Grouping, Elsevier, Amsterdam, 2000. R. Veltkamp, M. Hagedoorn, State-of-the-art in shape matching, in: M. Lew (Ed.), Principles of Visual Information Retrieval, Springer, Berlin, 2001. F. Ferrie, S. Mathur, G. Soucy, Feature extraction for 3-d model building and object recognition, in: A. Jain, P. Flynn (Eds.), 3D Object Recognition Systems, Elsevier, Amsterdam, 1993. M. Greenspan, The sample tree: a sequential hypothesis testing approach to 3d object recognition, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 1998.

way. As a consequence, the resulting model is still fairly complex. An obvious extension involves broadening the approach to include atoms other than at surfaces, such as cylinders, spheres, etc. Additionally, an interesting question concerns robot exploration. The issue of robot exploration has been studied extensively for building 2D maps, but we are not aware of robot exploration algorithms that apply to the full three-dimensional case. This case introduces the challenge that the robot cannot move arbitrarily close to objects of interest, since it is conned to a two-dimensional manifold. Finally, extending this approach to multi-robot mapping and arbitrary outdoor terrain (e.g., planetary exploration) are worthwhile goals of future research.

[6]

[7]

[8]

[9]

[10]

Acknowledgements This work reported here has partly been supported by the IST Programme of Commission of the European Communities under contract number IST-2000-29456. Additionally, parts of hit have been sponsored by DARPAs TMR Program (contract number DAAE07-98-C-L032), DARPAs MARS Program, and DARPAs CoABS Program (contract number F30602-98-2-0137). It is also sponsored by the National Science Foundation (regular grant number IIS-9877033 and CAREER grant number IIS-9876136), all of which is gratefully acknowledged. References
[1] P. Allen, I. Stamos, Integration of range and image sensing for photorealistic 3D modeling, in: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2000, pp. 14351440. [2] R. Bajcsy, G. Kamberova, L. Nocera, 3D reconstruction of environments for virtual reconstruction, in: Proceedings of the Fourth IEEE Workshop on Applications of Computer Vision, 2000. [3] S. Becker, M. Bove, Semiautomatic 3-d model extraction from uncalibrated 2-d camera views, in: Proceedings of the SPIE Symposium on Electronic Imaging, San Jose, 1995. [4] P. Debevec, C. Taylor, J. Malik, Modeling and rendering architecture from photographs, in: Proceedings of the 23rd International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 1996. [5] H. Shum, M. Han, R. Szeliski, Interactive construction of 3d models from panoramic mosaics, in: Proceedings of the

[11]

[12] [13]

[14]

[15]

[16] [17] [18]

[19] [20]

[21]

[22]

D. Hhnel et al. / Robotics and Autonomous Systems 44 (2003) 1527 Dirk Hhnel studied Computer Science at the University of Bonn. Currently he is a Ph.D. student at the Department of Computer Science at the University of Freiburg. His areas of interest lie in mobile robot navigation and especially in the learning of two-dimensional and three-dimensional maps.

27

Sebastian Thrun is presently the Finmeccanica Associate Professor of Computer Science and Robotics at Carnegie Mellon University. His research interests are centered around probabilistic robotics. Thrun received his Ph.D. from the University of Bonn (Germany) in 1995.

Wolfram Burgard studied Computer Science at the University of Dortmund and received his Ph.D. from the University of Bonn in 1991. Currently, he is associate Professor at the Department of Computer Science at the University of Freiburg. He is also afliated to the Center of Automated Learning and Discovery at the Carnegie Mellon University. His elds of interest lie in robotics and articial intelligence.

You might also like