RoboSoccer: Autonomous Robots in a Complex Environment
Pieter P. Jonker, Jurjen Caarls, Wouter J. Bokhove, Werner Altewischer, Ian T. Young*
Pattern Recognition Group, Faculty of Applied Sciences, Lorentzweg 1,
Delft University of Technology, NL-2628 CJ Delft, The Netherlands
ABSTRACT
We are participating in the international competition to develop robots that can play football (or soccer as it is
known in the US and Canada). The competition consists of several leagues each of which examines a
different technology but shares the common goal of advancing the skills of autonomous robots, robots that
function without a central hierarchical command structure. The Dutch team, Clockwork Orange, involves
several universities and the contribution of our group at the TU Delft is in the domain of robot vision and
motion. In this paper we will describe the background to the project, the characteristics of the robots in our
league, our approach to various vision tasks, their implementation in computer architectures, and the results
of our efforts.
Keywords: robots, soccer, computer vision, color segmentation and classification, Hough transforms, camera calibration,
analysis of robot performance
1 INTRODUCTION
th
Since at least the 16 century writers and artists have described machines that could serve man by relieving him of tedious
work and replacing him in dangerous environments. Coupled with each of these visions of the future has been the fear that
our creations could come to function outside of our control and themselves become dangerous. Perhaps the oldest legend is
that of the “Golem” who was constructed from clay by Rabbi Löw in Prague in 1580 to protect the community. In the
legend the Golem becomes a danger to the community. The word “robot” stems from the play “R.U.R.” written in 1920 by
the Czech Karel Capek. “R.U.R.” stands for “Rossum’s Universal Robots” and the word robot comes from the Czech word
“robota”, work. The concept of the robot was popularized in the 1950’s with a series of science fiction books and films such
as Isaac Asimov’s novels emphasizing the computational (and moral) aspects in the “positronic brain” and the “Three Laws
of Robotics”, the movie “Forbidden Planet, the “Star Wars” heroes/comics C3PO and R2D2, the “Terminator”, and “Blade
Runner.” All of these examples would be irrelevant to modern research in robot vision, autonomous systems, and the latest
incarnation of artificial intelligence were it not for the fact that these robots displayed independent behavior that could be
achieved through sensors, actuators, and algorithmic analyses. The most timely example of this transfer from science fiction
to technological reality has been the recent work at Boeing to develop robotic attack fighters, the Boeing X-45 project.1
At the forefront of university-led research into robotics, one of the most enchanting and inspiring is the RoboCup
competition ( http://www.robocup.org/ ) where the goal is to develop robots that can play football or, as it is known in the
US and Canada, soccer. This competition is organized into five different leagues, as illustrated in Figure 1, each of which is
at a different scale. Our work has been in the mid-size league where we use a “Nomad Super Scout” (Nomadic Technologies,
USA) some of whose specifications are given in Table 1.
height = 35 cm
Nomad weight = 28 kg
Max. speed = 1 m/s
Ø = 41 cm
Payload weight = 5 kg
Max. accel = 2 m/s
2
Ground clearance = 1.5 cm
Battery power ≈ 400 W-h
Processors at 2 levels
low Motorola 68332
high Pentium II 233 MHz
Playing time ≈ 1 hour
Pneumatic ball kicker ≤ 16 bar
Positional encoder resolution
Translation 756 counts/cm
Rotation
230 counts/deg
T a b l e 1 : Some specifications of the Nomad Super Scout.
Each team consists of four players and the field is 6 m wide and 12 m long. Color plays an important role in the game and
the conventions are that one goal is blue, one goal is yellow, lines are white, the field is green, the ball is orange, on one
*
Correspondence – E-mail: young@ph.tn.tudelft.nl WWW: http://www.ph.tn.tudelft.nl/ Telephone: +31-15-278-1416; Fax:
+31-15-278-6740
team the players wear magenta shirts, and on the other team the players wear cyan shirts. This is illustrated in Figure 2 both
schematically and with a color photo of part of the playing field.
(a)
(b)
(c)
(d)
(e)
F i g u r e 1 : (a) Simulation league based solely upon a computer simulation of two opposing teams and involving no
physical robots; (b ) small-size league involving “Lego-like” robots; (c ) four-legged league involving “puppy-like” robots
currently manufactured by Sony; (d) “mid-sized” league whose height and width can be around 40 cm; (e ) humanoid league
intended to mimic human size and bipedal motion.
(a)
(b)
(c)
Figure 2: (a) Standard playing field with four players on each team; (b ) the Nomad Super Scout robot as delivered from the
factory; (c ) two players from one team in magenta and two players from the opposition in cyan. The green field, white
lines, and orange ball are clearly visible.
2 INTERACTION WITH THE ENVIRONMENT
It is important to understand that each robot, while it listens to the reports from the three other team players and the “coach”
using wireless Ethernet, operates as an autonomous unit. The two actions that it can execute, movement or kicking, are not
the result of orders from a “military-like” hierarchical command structure, but rather based on its evaluation of its “world
picture” derived from the information it receives from others as well as from its own measurement sensors. Each robot has
two types of measurement sensors, odometry sensors for position and a 1-chip color CCD camera for robot vision.
Each robot is capable of translational and rotational motion and of kicking the ball with a specially-designed pneumatic
mechanism. All motion is accomplished through a two-wheel differential drive, visible in Figure 2b, whose geometric center
is at the center of the physical robot. The positional sensors measure the distance that the robot has traveled in a specific
translation as well as the angle through which it has turned during a rotation. The specifications of these two sensors are
described in Table 1.
The color camera that has been used is a Chugai NTSC camera model YC-02B and is shown in Figure 3b. The camera has
automatic white balance as well as automatic gain control. Because of the constrained and highly specific way in which
colors are used in this environment, we have disabled the automatic white balance and, instead, provided an absolute white
reference shown in Figure 3b. The lens used with the camera has a 94° field of view and good depth-of-field. The frame
grabber is a WinTV PC model used to provide 320 x 240 pixel images where each pixel is composed of {r, g, b} values
each of which is eight bits deep.
(a)
G
R
G
R
B
G
B
G
G
R
G
R
B
G
B
G
(b)
(c)
(d)
Figure 3: (a) Nomad Super Scout including computation, vision, communication, and ball-handling modules (b ) Detail of
the vision module showing the camera, the 94° wide-angle lens, the anti-glare polarizing filter and the absolute white
reference; (c ) the 1-chip CCD camera leads to coarse-grain sampling of red and blue and medium-grain sampling of green;
(d) the effect of the color sampling is shown in the color distortion in the sampling of the white lines.
CCD cameras that use only one chip are known to have problems in providing sufficient color resolution at every spatial
coordinate. This problem is illustrated in Figure 3c,d where the lack of {r, g, b} information at every pixel lead to errors at
the edges of the white line in the image of the playing field. It is to be expected that developments in camera technology
(e.g. Foveon cameras2) will eliminate this problem in the future but for now we must deal with it. A polarizing filter is also
visible in Figure 3b and this is used to reduce the effects of glare and specular reflections from various objects including the
ball whose surface, as shown in Figure 4, is quite glossy.
dribble
F i g u r e 4 : The glossy surface of the ball is
evident. The kicking mechanism as well as
the mechanism for transporting the ball
across the field (“dribbling”) can also be
seen.
kick
3 CALIBRATION
Using a standard white reference card instead of the built-in automatic color balance is one example of what we consider to be
calibration. Within the domain of robot vision several other calibrations steps are required to compensate for geometric
distortion due to camera tilt and the use of a wide-angle lens (94°). Further, even if both of these distortions were absent
there would still be the change in the measured size of objects as a function of their axial distance to the camera lens. Each
of these is problems is addressed in a different way.
The “tilt” comes from the fact that the camera, sitting at a height of some 40 cm above the floor, does not have a line-ofsight parallel to the floor but rather looks down at an angle of about 45°. This can be seen in Figure 3a,b. The tilt produces
a parallax distortion which can be corrected by standard techniques from projective geometry. Three parameters must be
estimated in a calibration step before a game begins: the focal length ƒ of the lens, the angle ϕ that the camera lens makes
with the horizon, and a vertical distance measure Y CCD in the plane of the CCD chip. With these numbers the non-linear
mapping of the original coordinate geometry to the corrected geometry can be computed and loaded into look-up tables to
permit correction for the tilt effect before a game begins. During an actual game the use of look-up-tables provides for highspeed tilt correction.
The use of a wide-angle lens means that a form of radial distortion occurs in the image. While for a normal lens this effect
would be negligible, in our case the distortion is not acceptable. This problem has been studied in-depth by Tsai3 and we
have implemented his solution in the form of look-up tables. Again, this correction requires a pre-game calibration step. The
use of the look-up tables is illustrated in Figure 5a,b and the results are shown in Figure 5c,d.
(a)
(b)
(c)
(d)
F i g u r e 5 : Correcting for radial distortion. Pixels for which no image information is available after application of the
coordinate transformation are shown in red. (a) correction terms for the horizontal component in the look-up table; (b )
correction terms for the vertical component in the look-up table; (c ) distorted original scene; (d) distortion removed after
application of the Tsai algorithm.
Another “distortion” is caused by the apparent decrease in size of the ball as its distance to the camera gets larger. We
calibrate for this by first using our prior knowledge about the actual size of the ball and then calculating what its size would
be for every possible location of its center y coordinate in the image. This yields a matrix of expected radius r versus y
which can (again) be used in a calibration look-up table to analyze the image. This effect is illustrated in Figure 6 where two
soccer balls are simulated on a grass field. In Figure 6a we see the two orange spheres as approximately equal in size. But
when the camera viewpoint is changed to a position similar to that of our robot camera, as shown in Figure 3a,b, this
results in the situation depicted in Figure 6b. The two spheres are no longer of the same size and the tilt of the camera places
the sphere that is closer to the camera as lower in the image. This correlation between size, expressed in a radius r, and
vertical position in the image, as expressed as y, is the basis of a mechanism to estimate the position of the soccer ball that
will be described below.
(a)
(b)
Figure 6: Simulated environment for analyzing the relationship between perceived size and position in a projected image
(a) two soccer balls in a 3-D environment; (b ) the two soccer balls as seen from the robot camera position.
In summarizing, all three of the distortions described above are handled by a combination of calibration steps that are taken
before a game and then implemented in correction look-up tables during a game.
4 IMAGE SEGMENTATION
Our strategy for analyzing the images in real time is to first classify every pixel on the basis of its color into one of several
classes: playing field (green), ball (orange), goal (yellow), goal (blue), lines (white), team players (magenta or cyan),
opposition players (cyan or magenta), and robot bodies (black). While the color space provided by the camera is {r, g, b} a
more appropriate space might be {h, s, i}. Unfortunately, the mathematical transformation from {r, g, b} to {h, s, i} in
software would take too much time and the hardware chip to do this would involve an extra circuit board which would cost
us energy and, as indicated in Table 1, our energy budget is strictly limited. We have chosen, therefore, to use a modified
form of the {y, u, v} color space which we refer to as {y’, u’, v’}. The transformation is given by:
y' 1
u' = 1
v' 0
1 1 r
−1 0 • g
−1 1 b
(1)
and only additions and subtractions are required to produce this representation. The actual color information is contained in
the coordinates u’ and v’. The division of this two-dimensional color space into classification regions is based on a “piechart” approach, where different angular regions are considered as describing the colors associated with certain objects. The
steps in this pixel classification scheme are shown in Figure 7.
v’
u’
(a)
(b)
(c)
Figure 7: Pixel classification using {u’, v’} (a) an original color scene; (b ) the position of each pixel in {u’, v’} space;
(c ) the result of classification and labeling with a pseudo-color.
The next step is to go from classification at the level of pixels to finding the objects (lines, goals, ball, etc.). To accomplish
this, a fast, intermediate step is used to find regions-of-interest (ROIs) where we can expect with a reasonable certainty that
the objects will be found within the pixel-classified image. The basis of this step is the use of projection histograms, a tool
that was developed and used more than 30 years ago. The basic concept is illustrated in Figure 8 for two of the class labels,
orange (ball) and black (robot). Looking across each row, i, we sum the number of pixels that have been labeled with a
specific color. On the third row in Figure 8, for example, we see that there is a total of two orange pixels and three black
pixels. Continuing this for all rows and all columns we thus compute two “orange” histograms (one for the columns and
one for the rows) and two “black” histograms. This procedure is, of course, not limited to just these two colors but is
instead applied to all the colors in our labeled image except green; we do not need to know where the playing field is.
↓ Projection i
i→
j↓
0
3
3
0
0
4
4
0
0
0
2
0
3
0
4
2
1
2
0
2
0
4
0
2
0
2
0
0
0
0
0
2
3
2
0
0
2
1
0
0
← Projection j
Figure 8: Projection histograms for two of the colors and for the rows and columns associated with each color.
We now filter these histograms to reduce the effect of noise and then determine the ROIs. The filtering algorithm is a
straightforward two-step operation and illustrated in Figure 9.
0
2
0
1
0
1
3
3
1
1
1
1
3
2
1
1
1
1
0
0
0
0
0
0
0
0
→
0
0
→
0
0
4
2
Value in cell > 1
1
1
2 out of 3 consecutive entries > 1
1
0
4
1
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
(a)
step 1
(b)
step 2
(c)
(d)
Figure 9: Filtering projection histograms (a) original row histograms taken from Figure 8; (b ) thresholded histograms;
(c ) 2-out-of-3 filtered result; (d) ROIs identified for objects in the image. The colors of the rectangular boundaries have no
special significance and are only intended to make the ROIs visible.
The first step in filtering the histograms is thresholding. Histogram values greater than one (> 1) in the projection
histogram (Figure 9a) are set to one; otherwise they are set to zero. The second step is to look at three consecutive values of
the thresholded result as shown in Figure 9b. If less than two (< 2) of the entries are one then the center value in the
window of three is set to zero; otherwise it is set to one. The result in Figure 9c shows that two objects have been found in
the black-labeled image and one object has been found in the orange-labeled image. By “found” we mean that a rectangular
ROI has been identified around labeled objects, their labels are still known, and noise-like objects have been rejected. An
example of applying such a projection histogram procedure is shown in Figure 9d where the nine ROI bounding boxes are
indicated in arbitrary colors. The final step is classification of the objects within the ROIs.
5 OBJECT CLASSIFICATION
The label derived from the color segmentation and now attached to a ROI gives a reasonable estimate of what object can be
found inside the ROI. Playing soccer, however, requires more information. We must be able to move to the ball. We must
know where the goal is, where the boundary lines of the playing field are, and the status of the other robots that we have
seen. Are they on our team or part of the opposition?
The principle tool that we use, in addition to color, to identify an object inside a ROI and determine its parameters is the
Hough transform. This well-known technique4 allows us to determine which objects are straight lines and which are circles.
The former is essential if we are to know where we are in the absolute coordinate system of the playing field. The odometry
measurements are not sufficient for this as it is too easy for the robot to lose its position with the relative information
provided by the position encoders. (This can happen, for example, when a robot “spins its wheels” after collision with
another object.) Our implementation of the Hough transform for straight lines is standard and will not be elaborated upon
here. Our implementation of the Hough transform for circles takes into account the fact the we do not know the radius of the
circular object that we are seeking but, if we can find the center coordinates of the ball (Xc , Y c ), then knowledge of Y c
together with our knowledge of the true radius of the soccer ball will tell us where the ball is on the playing field. We
proceed as illustrated in Figure 10. We generate an image of the contour pixels on the orange object inside the ROI. We start
with a guess of the ball’s radius based upon its possible y position in the playing field as illustrated in Figure 6b, that is,
we choose a (weighted) average starting estimate for Ro. We now apply the Hough transform to the representation of a circle
using the two-parameter model:
( x − Xc )2 + ( x − Yc) 2 = R2o
(2)
The transform space is now characterized by (Xc , Y c ) and for each pixel on the contour we obtain a circle in the new space.
Using peak detection in the parameter space leads to the most likely value for the center coordinates of the ball. We now
proceed to that position in the calibration image that provides a revised estimate of the proper value of R at that position y =
Y c and then using the revised estimate for R = R1, we repeat the Hough procedure. This converges quickly to a satisfactory
estimate of the position of the ball on the playing field. This iterative procedure is summarized in Figure 10.
update radius estimate
center of ball
iteration
default radius
Circular Hough transform
(a)
(b)
F i g u r e 1 0 : Circular Hough transform for ball finding (a) an iterative procedure starts with an estimate for the ball’s radius.
This is updated on the basis of the ball’s “vertical” position in the image; (b ) the ball’s position indicated by the black
circle and the two field lines indicated by the red lines as found using Hough transforms.
6 COMPUTER STRATEGIES
It is obvious that all of the computations must be done in the most time-efficient and energy efficient manner possible. To
accomplish this we have looked at the use of two different types of processors for the image processing and analysis. These
two processors are an Intel Pentium processor running at 233 MHz and a 40 MHz IMAP Processor board (NEC
Corporation, Kawasaki, Japan). The former is a general purpose processor running under Linux Red Hat v5.2. The latter is a
board containing 8 special chips each of which in turn contains 32 processor elements (PEs) for a total of 256 PEs on a
single board. The Pentium architecture is well-known; a detailed description of the IMAP board can be found in vd Molen et
al.5 and a picture of the layout is shown in Figure 11.
Figure 1 1 : IMAP-Vision
board with 8 x 32 processor
elements.
The IMAP is significantly faster than the Pentium as the figures in Table 2 show. But the IMAP also uses a considerable
amount of current and it is not possible to replace the Pentium with the IMAP. The Pentium is needed for the motion,
communication, and kicking systems. The IMAP can only serve as an additional processor dedicated to image processing
tasks. We therefore have to consider the situations where it is appropriate to make use of this technology. The Pentium uses
a certain average amount of power. The three Nomad robots that move over the entire playing field also consume a
significant amount of energy in the course of a game through their mechanical actions. The goalie robot, however, has only
a limited amount of mechanical motion but needs quick visual “reflexes” to defend the goal. As a result we have equipped the
goalie with the IMAP-Vision system. The other three robots must rely on the Pentium for their image processing. A
comparison of the effectiveness of these two architectures is given in Table 2.
Pentium with 64 MB RAM
233 MHz clock frequency
IMAP-Vision with 256 processors
40 MHz clock frequency
Color classify objects
10 ms
7.7 ms
Find ROIs
10 ms
2.6 ms
Circular Hough transform
50 ms
3.9 ms
Linear Hough transforms
50 ms
4.2 ms
Find playing field corners
2 ms
2.1 ms
Process
T a b l e 2 : Comparison of the two processors on the same tasks.
These are somewhat specialized tasks. When a 300 MHz Pentium II with MMX architecture was compared against the same
IMAP-Vision card on a suite of standard image processing algorithms (e.g. add, AND, 3x3 convolution, Sobel, dilation,
etc.), the IMAP-Vision was 10 times faster. Both architectures have been updated and the latest version of the IMAP card
remains at least 5 times faster than the latest Pentium architecture.6
7 SUMMARY AND CONCLUSIONS
The actual effectiveness of our approach must be judged not by the speed of individual algorithms but the performance at
tasks. The detection of all major objects is good with the weakest aspect being the detection of the black robot bodies. The
color black also means “no signal” which in turn means a poor signal-to-noise ratio in the relevant pixels. The color
classification is fine with the only problem being in the recognition of the team colors, magenta and cyan. Angular
measurements are accurate to within 1°. Distance measurements are accurate to within 5% meaning 5 cm at a distance of 3
meters and 2 cm at a distance of 40 cm. This latter figure is important in ball-handling. The image processing algorithms
take an average of 26% of the Pentium processing capacity when images are analyzed at 15 fps. (This is the rate that has
been chosen in using images of 320 x 240 pixels.) The remaining capacity is available and needed for the other
aforementioned tasks.
Our team “Clockwork Orange” has not as yet won any championships but there has been a marked improvement in our
performance. In the June 2001 German open we lost 2 games and tied 2 games. We scored 0 goals and the opposition scored
7. Two months later at the World Cup competition we won 4 games, tied 1, and lost 3. Our goal balance changed even more
dramatically; we scored 16 goals and the opposition 17.
There are many components of a soccer team that are outside the scope of image processing such as motion strategies and
team strategies. All of these must be addressed to have a successful team. At this moment we are reasonably satisfied with
the image processing components although we hope to improve the speed and accuracy of all of our visual detection tasks.
ACKNOWLEDGMENTS
This work was partially supported by the Delft University of Technology, the Philips Center for Fabrication Technology,
Dr. Sholin Kyo and the NEC Incubation Center in Kawasaki, Japan, and the Dutch Ministry of Economic Affairs through
the Senter program “IOP Beeldbewerking”. We gratefully acknowledge their support.
REFERENCES
1
Russ Mitchell, “The Pilot, Gone. The Market, Huge,” in New York Times in Business section (New York, 31 March
2002), pp. 1-2.
2
John Markhoff, “Digital Sensor Is Said to Match Quality of Film,” in New York Times in Business Financial section
(New York, 11 February 2002), pp. 1.
3
Roger E. Tsai, “A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology Using Offthe-Shelf TV Cameras and Lenses,” IEEE Journal of Robotics and Automation, RA-3:4, pp. 1987.
4
P.V.C. Hough, Method and means for recognizing complex patterns, Patent # 3069654, USA, (1962).
5
M. vd Molen, S. Kyo, and W. Bokhove, Documentation for the IMAP-VISION image processing card and the 1DC
language, NEC Incubation Center, Kawasaki, Japan, 1999.
6
S. Kyo, T. Koga, and S. Okazaki, “IMAP-CE: A 51.2 GOPS video rate image processor with 128 VLIE processing
elements,” presented at ICIP-2001, Thessaloniki, Greece, Editor(s): I. Pitas, M. Strintzis, G. Vernazza et al. ed., pp. 294297, 2001.