3D Time-Of - Ight Cameras For Mobile Robotics
3D Time-Of - Ight Cameras For Mobile Robotics
3D Time-Of - Ight Cameras For Mobile Robotics
Abstract Recently developed 3D time-of-ight cameras have an enormous potential for mobile robotic applications in particular for mapping and navigation tasks. This paper presents a new approach for online adaptation of different camera parameters to environment dynamics. These adaptations allow the usage of state-of-the-art 3D cameras reliably in real world environments and enable capturing of 3D scenes with up to 30 frames per second. The performance of the approach is evaluated with respect to different robotic specic tasks in changing environments. A video, showing the performance of the approach, is available at [1].
SR-2 camera [4] and its required pixel-wise calibration for the experimental setup. In section III the performed experiments and the resulting approach for online scene adjustment are described. Section IV discusses the results and concludes the paper.
I. I NTRODUCTION Automatic environment sensing and modeling is a fundamental scientic issue in mobile robotics regarding the capacity to interact with three-dimensional real world environments. The problem domain of autonomous robotics comprises two major tasks with different key aspects. The rst one covers exploration issues while creating accurate three dimensional maps. Here, high resolutions with precise 3D data as well as fast and accurate matching algorithms are required to create consistent scenes. The second task covers the question of exploration and navigation in known and unknown terrains. Real-time 3D computation of the scene in the moving direction of a robot is required to ensure obstacle avoidance, whereas the precision is secondary. The real-time capability is also mandatory for mapping and surveying tasks if environment dynamics are considered. The enhancements of photonic mixer devices that enable 3D image grabbing within a few milliseconds give an important impulse in visual 3D sensing. Today, 3D cameras with a resolution of round about 20 thousand pixels and a frame rate with up to 30 frames per second are available [2][3]. Beside the high data rate its low weight and small size makes it a very interesting sensor for the mobile robotics community. But similar to passive visual sensors, this type of camera is very fragile to changing lighting conditions which leads not only to imprecise data but also to completely wrong measurement data. These errors are inuenced by the physical properties of the sensor as well as by environmental conditions. For robotic issues, primarily the environmental inuences are important since the real world is unpredictable. Without algorithms for online environmentadaptation and data preprocessing this sensor is not suitable for autonomous mobile robotic tasks. This paper is structured as follows. In the next sub-section a brief state-of-the-art of 3D sensors for mobile robotics is given. Section II briey describes the CSEM Swiss Ranger
Fig. 1.
A. State of the art Since the problem of sensing the spatial properties of the environment is fundamental for autonomous robotics, many different groups have been working in this eld for a long time. Today the most common techniques for 3D sensing are CCDor CMOS-camera, laser scanner or recently 3D time-of-ight camera based. Due to its physical properties, the resolution and precision of ultra sonic sensors have been shown to be not suiteable for fast and safe interaction of an autonomous system with its environment. As CCD- or CMOS-camera based approaches to 3D robot vision mainly stereo cameras or structure from motion techniques are known [5], [6]. Both have difculties providing reliable navigation or mapping information for a mobile robot in real-time and like all passive visual sensors, they are difcult to handle in real world environments with changing lighting conditions. In addition to that, some groups try to solve 3D modeling by using a planar 2D laser scanner and cameras, e.g., in [7]. A few other groups use either highly accurate, expensive 3D laser scanner or 2D laser range nder expanded by a rotating device [8], [9], [10], [11], [12]. The registration of several laser scans taken at different positions is a common problem caused by inaccuracy of pose information provided by other sensors
790
such as odometers. The Iterative Closest Point algorithm (ICP) is a standard approach to register those scans into a common coordinate system [13], [10], [11]. Other approaches used feature-based scan matching techniques for registering these 3D scans, e.g. in [8]. High range, high accuracy and high reliability has to be balanced with high costs, huge size, weight and power consumption. The potential of small laser range nder has already been identied. Some vendors have recently developed smaller laser range nder (but with limitations of range) and announced further improvements for the next type series [14]. But the problem of a low performing sampling rate is remaining. Thats why this sensor group is not applicable to detect environment dynamics in three dimensions. A very new and promising technique are the already mentioned 3D cameras, which are based on the photon mixer device technology [2], [3]. First approches in robot navigation with an evaluation prototype of a Swiss Ranger SR-2 were presented by Weingarten et al. in [15]. They have shown that obstacle avoidance with the Swiss Ranger has many advantages in contrast to a 2D laser scanner. In comparison to laser scanner based 3D sensors the SR-2 provides 3D information with a framerate of about 30 frames per second (fps) and thus is much faster. But they also pointed out that camera calibration and data preprocessing were necessary to get stable sensor data. Outside of the robotics community Oliver Gut focused on surveying and mapping tasks with the SR-2, where high precision is required, but he has shown that it is not achieved with this current version of the 3D camera [16]. He uncovered several erroneous inuences based on environmental or sensorical inuences. II. T HE S WISS R ANGER SR-2 C AMERA Our experimental setup consisted of a Swiss Ranger SR-2 device connected via the USB 2.0 interface to a workstation running SuSE Linux 9.3. The swiss ranger device was mounted on a rotatable rack using a servo motor, which was used to adjust the device to several positions. Concatenating images while pivoting the device enabled a virtual 180 degree view. A. Inside the Swiss Ranger SR-2 The camera belongs to the group of active sensors. It uses the phase-shift principle to determine distances. While the environment is being illuminated with infrared ashes, the reected light is measured by a CCD/CMOS sensor. It provides amplitude data, intensity data and distance data, which are weakly addicted to each other. An image can formally be dened as: Dependencies between these data values will be investigated in detail in section III. Amplitude data represents the incoming waves amplitude, intensity its offset and distance its phase shift. For a detailed description of the measurement principle, please refer to [15]. The camera comes with a resolution of 124x160 pixel. All measurements are being organized by a FPGA, which provides an USB interface to access the data values. The FPGA can be congured setting one or more of its eleven registers. The most important register concerns the adjustment of integration time. It has a range from
1 to 255, which are multiple of 255 s. Finding the optimal value will be investigated in section III-A. B. Camera calibration A per-pixel calibration is needed to enlarge accuracy. The camera must be mounted in a dened distance towards a white smooth wall. First, some captures have to be taken to ensure the camera to accumulate to the environmental conditions. The vendor advises a minimum warm-up time of about 10 minutes. As next, the calibration run needs an optimal adjusted integration time. Oversaturation will falsify measurements. A simple approach to determine the right integration time is discussed in section III-A. The distance offset register is set to 0 during calibration. We propose to determine a calibration matrix in polar form since these are the raw values provided by the camera. Some applications might use polar coordinates instead of cartesian coordinates. The matrix is used later for distance corrections by subtracting related measurement values. Because of the cameras phase-shift principle, you run into a modulo 2 problem. The only known calibration routine is published by the vendor himself since yet and does not consider this problem. This might cause a reduction of the effective range. To avoid negative values that can appear after calibration matrix subtraction we propose the following correction method, which is expanded by a modulo operation. li,j = (mi,j oi,j + re ) mod re , (1)
where i and j are row and column indizes, mi,j the measured and li,j the corrected polar distance value of its related pixel. oi,j describes the pixel specic offset value and re the effective range of sensor, i.e. 7.5 m for the SR-2. III. E XPERIMENTS AND RESULTS Weingarten et al. used an empirically found solution to suppress noise by decomposing the space into cells [15]. A cell with a minimum number of data points is to be considered occupied. This simplication ts well for navigation tasks but will lack for mapping issues. First, we analyzed inaccurate data points to verify the conditions under which they appear. The examinations of Gut [16] provide an informative basis. It focuses surveying and mapping tasks, where high precision is required. The most interesting irritating effect is that of scattering light on near objects. As already mentioned above, the provided data values are weakly addicted to each other. Intensity information of an object, for example, depends on its distance, its alignment in relation to the sensor and its surface properties, like color and texture. Amplitude and intensity values allow to predict the accuracy of distance values. Some test scenarios should represent common environments a robot will typically act in and therefore include different compositions of objects to demonstrate their inuences. A. Setting up the integration time First of all, the integration time has to be set up. It is one of the most important parameters to get stable data. It has to be adjusted in relation to each scene, otherwise too high
791
saturation could cause erroneous effects. Figure 2(a) and 2(b) show the same scene taken with the Swiss Ranger device running at two different integration times. To illustrate 3D data in this paper, we choosed a false color representation using the HSV model starting at 60 degrees in reverse order to 240 degrees. Near objects will appear yellow, far objects blue. Objects with medium distances appear red or magenta.
close object in 0.3 m close object in 0.5 m close object in 1 m complex indoor scene wide indoor scene
20
25
oversaturation gap
(a)
(b)
Fig. 2. Two sample measurements using different integration times. 2(a) Measurement of a close hand with an integration time of 15 ms, what was denitely too high for that scene. 2(b) Measurement of a close hand with an integration time of 4 ms, that ts better for that scene.
20
25
The noisy area is caused by a near bright object, i.e. a close hand in this case. The second measurement (with smaller integration time) provides better results, which means that less noise superposes the image. A series of measurements to determine the dependency between integration time and data values of the Swiss Ranger device can be seen in gure 3. Each curve represents the determination of data values from the same scene varying the integration time. The signicant range with constant distance data is also represented by high mean amplitude values. The gap between the intensity graph and its linearisation as well between the amplidute graph and its linearisation determines the degree of oversaturation. Only values in the linearly parts feature accurate distances. Oggier et al. already reported the development of an algorithm to automatically select the best integration time for image acquisition [17]. The algorithm considers amplitude and intensity values of each pixel. The sum of both values is limited by a maximum value, which is technically determined. A heuristically found threshold signals oversaturation of single pixel values. Instead of adjusting the integration time to prohibit oversaturation of single values, we propose an algorithm to consider the mean accuracy. We found out, that mean accuracy correlates with mean intensity and mean amplitude in the linear segment. The most precise mean accuracy could be acquired with an integration time located near the amplitudes maxima. The maxima determination is a complex task, because mean amplitude functions do not provide stable approximations of working points. A tricky way is to translate the maxima into the intensity diagram. The intersection points lie approximately on a straight line (see gure 3). Mean intensity values can be calculated on the y during capturing. We relocated the
20
25
Fig. 3. Determining optimal integration time. Note: Data values represented here are register values with a range of 0 to 65532.
approximating value IA a small interval below to t equation 2 best. Note that the image has a non-neglective saturation at the amplitudes maximum. Values for IA between 15000 and 18000 provided reliable results. The integration time determination algorithm is based on the following scheme: 1) Calculate the mean intensity It from the intensity dataset It at time t. 2) Determine control deviation Dt = It IA . 3) Update control variable ct+1 = Vp Dt + c0 for grabbing the next frame, where ct+1 is the integration time, Vp the propotional closed loop deviation parameter and c0 a suitable initial value. This approach assumes that two images succeeding one another do mostly not differ signicantly. This term is complied best at high frame rates. Using mean values as control variable has two advantages. First, these values can be determined on-
792
Mean amplitude
the-y while grabbing data and this does not consume much computing power. Second, the provided mean values correlate with the mean accuracy and thus is a great convienence to enlarge the average information content. We have chosen to use a propotional controller, because handling environment dynamics requires quick adaptations. B. Accuracy determination After integration time determination the distance accuracy has to be rated. Supposing that the photonic interference is the main reason for inaccuracy, its inuence has to be minimized to get precise data. A physical law relating inaccuracy with photonic interference is dened formally as [18]: I L L = , (2) 8 2A where L represents the inaccuracy, L the maximum distance, A the amplitude value and I the intensity value. The maximum distance is dened as c , (3) L= 2 fmod where c is the speed of light and fmod the modulation frequency. The Swiss Ranger device uses a frequency of 20 MHz. Thats why it has an effective range of 7,5 meters. Assuming linearity, equation 2 must have its minima near the amplitudes maxima. Combining integration time determination and inaccuracy thresholding has to be proved for connectivity. We used the inaccuracy threshold as pre-lter, which rejects points with inaccuracies higher than a xed level. Remaining points were considered for the mean inaccuracy. Figure 4 shows a scene taken in our robotic pavilion. This scene was captured in
25000
20000
15000
10000
5000
20
25
Fig. 5. Mean amplitude graph of two different scenes. The amplitude values of the upper curve run into oversaturation.
2000
1500
1000
500
20
25
(b) Mean inaccuracy for non-saturated scenes Fig. 6. Mean inaccuracy in depenency of threshold values
Fig. 4. Complex scene used for a measurement series to determine integration time. Note, that the poster has a glossy surface and quickly leads to oversaturation for short distances.
different distances to determine oversaturating effects. Two signicant measurement examples are illustrated in gure 5. One of the measured scene runs into oversaturation due to near objects. The other measured scene does not include near objects, why it provides good linearity. The measurement series included the assimilation of the mean inaccuracy. Figure 6 shows the mean inaccuracy in dependency of threshold values. Each minimum is located
nearly at the same position. Consequently, it will have no inuence on the integration time determination. There is also another fact indicating that the right integration time is found. It is the number of data values tting the given threshold. Figure 7 illustrates that not only the accuracy of valid points is growing (falling inaccuracy) in dependency of integration time, but also the number of points tting the threshold constraint. Although the optimal integration time is found there is still erroneous data remaining. Pixel information with less distance accuracy will be useless. The threshold experiment
793
20000
# of valid points
15000
10000
ing the threshold lter to similar scenes are shown in gure 9. All background informations which are highly irritated could be ltered out. It is amazing that the gaps between two ngers could also be determined. These gaps are located very close to their inuencing areas, where scattering light dominates the scene.
5000
20
25
Fig. 7. Number of data values tting a given threshold for the non-saturating scene
demonstrates a simple lter to identify some of these pixels. Some further experiments were done based on this lter with the purpose of proving the sensors reliability in serious environmental situations. C. Materials with high reectivity and light emitting objects Gut pointed out, that multipath reexion and scattering light are types of environmental sensor inuences. These effects could be conrmed during our tests. We took a small mirror and placed it somewhere in an indoor scene. Measurements towards the mirror were highly irritated by reected light. It was not only noise which superposes the measurement, but also erroneous data. Distance informations taken at several integration time settings were sometimes completely incorrect. Similar effects could be observed with bright moving objects close to the camera. The brightness of those objects also inuences the surrounding pixels of their representation on chip. In animated distance data illustrations, the background seems to move. Figures 8(a) and 8(b) demonstrate this effect. Both gures show the same scene, except a very close hand appearing at the botton of the right picture. The hand was reconstructed afterwards in pink color to mark its position.
(a)
(b)
Fig. 9. Applicability of threshold lter to light scattering scenes. 9(a) Usage of threshold lter to supress erroneous data from scattering light of a very close hand. 9(b) Scattering light highly inuences low amplitude areas.
The result of irritating the camera by its own emitted light induced us to carry out another experiment. We were using an ordinary light source instead of the mirror and came to the same results. The sensor was not only inuenced by its own emitted light, but also by lamps or by sunlight, because ambient light has partly an infrared spectral component. The inuence had been grown while lowering the reection rate of the scene, e.g. if objects were moving away. In gure 10 you can see two drop lights, which were switched on and off during capturing. The droplight area in gure 10(b) is very
(b)
Fig. 8. The Swiss Ranger is highly inuenced by scattering light of near objects. These gures illustrate the problem that comes without applying lters. Areas with low intensity are being more inuenced by scattering light. 8(a) Scene including no near objects. 8(b) Scene including a very close hand at the bottom.
Fig. 10. Determination of inuences of light emitting sources. 10(a) Image taken from Swiss Ranger device of two drop lights, which are switched off. 10(b) Image taken from Swiss Ranger device of two drop lights, which are switched on.
The threshold lter developed in the previous section could determine the erroneous data values reliably. Results of apply-
noisy, if the light is switched on. Applying the threshold lter provided stable results. The noisy area of the droplight as well as of the window area was ltered out completely. You can see the results in gure 11.
794
laser range nders. Furthermore, we are looking forward to the next generation of 3D cameras. R EFERENCES
[1] S. May, H. Surmann, and K. Pervoelz, Video: 3d cameras for mobile robotics, http://www.ais.fraunhofer.de/surmann/videos/iros2006.mpg, 2006. [2] C. C. S. dElectronique et de Microtechnique SA, Swiss ranger sr-2, http://www.swissranger.ch, 2005. [3] P. GmbH, Pmd-cameras, http://www.pmdtec.com/e inhalt/produkte/kamera.htm. [4] C. C. S. dElectronique et de Microtechnique SA, Csem microtechnology, http://www.csem.ch, 2005. [5] S. Se, D. Lowe, and J. Little, Local and Global Localization for Mobile Robots using Visual Landmarks, in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 01), Hawaii, USA, October 2001. [6] C. Jennings and D. Murray, Stereo vision based mapping and navigation for mobile robots, 1997. [Online]. Available: citeseer.ist.psu.edu/murray97stereo.html [7] P. Biber, H. Andreasson, T. Duckett, and A. Schilling, 3D Modeling of Indoor Environments by a Mobile Robot with a Laser Scanner and Panoramic Camera, in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 04), Sendai, Japan, September 2004. [8] P. Allen, I. Stamos, A. Gueorguiev, E. Gold, and P. Blaer, AVENUE: Automated Site Modelling in Urban Environments, in Proceedings of the Third International Conference on 3D Digital Imaging and Modeling (3DIM 01), Quebec City, Canada, May 2001. [9] A. Georgiev and P. K. Allen, Localization Methods for a Mobile Robot in Urban Environments, IEEE Transaction on Robotics and Automation (TRO), vol. 20, no. 5, pp. 851 864, October 2004. [10] V. Sequeira, K. Ng, E. Wolfart, J. Goncalves, and D. Hogg, Automated 3D reconstruction of interiors with multiple scanviews, in Proceedings of SPIE, Electronic Imaging 99, The Society for Imaging Science and Technology /SPIEs 11th Annual Symposium, San Jose, CA, USA, January 1999. [11] A. N chter, K. Lingemann, J. Hertzberg, H. Surmann, K. Perv lz, u o M. Hennig, K. R. Tiruchinapalli, R. Worst, and T. Christaller, Mapping of Rescue Environments with Kurt3D, in Proceedings of the IEEE International Workshop on Rescue Robotics (SSRR 05), Kobe, Japan, June (submitted). [12] O. Wulf, K. O. Arras, H. I. Christensen, and B. A. Wagner, 2D Mapping of Cluttered Indoor Environments by Means of 3D Perception, in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 04), New Orleans, USA, April 2004, pp. 4204 4209. [13] P. Besl and N. McKay, A method for Registration of 3D Shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 239 256, February 1992. [14] H. Automatic, Hokuyo, small size 2d laser scanner, http://www.hokuyo-aut.jp/, 2006. [15] J. W. Weingarten, G. Gruener, and R. Siegwart, A state-of-the-art 3d sensor for robot navigation, in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 04), Sendai, Japan, September 2004. [16] O. Gut, Untersuchungen des 3d-sensors swissranger, Masters thesis, Swiss Federal Institute of Technology Zurich, 2004. [17] Thierry Oggier, Bernhard B ttgen, Felix Lustenberger, Guido Becker, u Bj rn R egg and Agathe Hodac, Swissranger sr3000 and rst o u experiences based on miniaturized 3d-tof cameras, http://www.swissranger.ch/pdf/application sr3000 v1 1.pdf, CSEM, IEE, Fachhochschule Rapperswil Switzerland, Tech. Rep., 2005. [18] Peter Seitz, Thierry Oggier and Nicolas Blanc, Optische 3d-halbleiterkameras nach dem ugzeitprinzip, http://www.extenza-eps.com/extenza/loadhtml?objectidvalue=49132&type=abstract, CSEM, Tech. Rep., 2004.
(a)
(b)
Fig. 11. Determination of erroneous data. 11(a) Filtered image taken from Swiss Ranger device of two drop lights, which are switched off. 11(b) Filtered image taken from Swiss Ranger device of two drop lights, which are switched on.
IV. D ISCUSSION AND C ONCLUSION As conclusion, we state that the Swiss Ranger time-ofight camera SR-2 is a very interesting sensor for mobile robotics presumed that reliable calibration, lighting adaptation and accuracy ltering is applied. This paper presents a novell approach to solve these three main problems to use the camera in robotic applications. 1) First, we presented a simple approach to calibrate the camera, which is an essential precondition to grab precise 3D data. 2) Second, we implemented an online algorithm to determine approximately the optimal integration time for the next sequence. Due to the measurement principle, a 3D camera is susceptible to oversaturation, if close bright objects or light sources are present in the scene. It is important to determine the correct adaptation parameters for each scene. 3) Third, we developed an accuracy lter based on the physical principle of photonic interference to determine inaccurate distance values. The threshold lter reliably meets this demand. It depends on the use case how to handle the inaccuracies. For mapping issues, it might be best to discard these informations since it is likely that the accuracy is higher at a different perspective. Navigation tasks should also consider inaccurate information as the case may be with less or more emphasis. These three approaches enable the usage of this camera for robotic navigation and mapping tasks. A video showing the performance can be found at http://www.ais.fraunhofer.de/surmann/videos/iros2006.mpg. The advantages of 3D cameras over previous visual sensing techniques, like laser range nders, are its small size, its small weight and mainly its high performance with up to 30 frames per second. Especially for navigation tasks, the absence of paning tilts is also worth mentioning. Needless to say, a lot of work remains to be done. Future work will concentrate on 3D object segmentation and recognition as well as our SLAM algorithms implementated for 3D
795