Min-Cut Based Segmentation of Point Clouds
Min-Cut Based Segmentation of Point Clouds
Min-Cut Based Segmentation of Point Clouds
Aleksey Golovinskiy
Princeton University
Traffic Light
Car
Abstract
We present a min-cut based method of segmenting objects in point clouds. Given an object location, our method
builds a k-nearest neighbors graph, assumes a background
prior, adds hard foreground (and optionally background)
constraints, and finds the min-cut to compute a foregroundbackground segmentation. Our method can be run fully automatically, or interactively with a user interface. We test
our system on an outdoor urban scan, quantitatively evaluate our algorithm on a test set of about 1000 objects, and
compare to several alternative approaches.
1. Introduction
As 3D scanning technologies advance, the promise of
ubiquitous 3D data is fast becoming reality. In particular,
3D point clouds of entire cities are becoming available. This
explosion of data fuels a need for algorithms that process
point clouds. The segmentation of point clouds into foreground and background is a fundamental problem in processing point clouds. Specifically, given an estimate for the
location of an object, the objective is to identify those points
that belong to the object, and separate them from the background points. Besides the essential task of separating foreground from background, segmentation can be helpful for
localization, classification, and feature extraction. In this
paper, we describe and evaluate a min-cut based segmentation algorithm that was summarized in [6] as a part of a
system to detect objects in outdoor urban scans.
The problem of segmenting objects in 3D point clouds is
challenging. The foreground is often highly entangled with
the background. The real-world data is noisy. Sampling is
uneven: ground-based scans have point densities that dominate from the direction the scan is taken, and airborne scans
Thomas Funkhouser
Princeton University
have poor sampling for nearly vertical surfaces. In addition,
data sets such as the one studied in this paper consist of
point clouds aggregated from both land and airborne scans,
leading to considerable discrepancies in sampling rates between different objects and often different surfaces of the
same objects. Finally, non-reflective surfaces such as windows are missing. Examples of results of our method overcoming some of these difficulties are shown in Figure 1
Since large-scale outdoor point cloud scans are an
emerging source of data, there is not much work describing segmentations of such scans. What work exists
mostly focuses on the extraction of geometric primitives
or parts( [13, 18]) rather than entire objects. We adapt the
techniques of computer vision ( [1]) and computer graphics
(e.g. [9]), where graph-cut based methods have been used
to, respectively, separate foreground and background in images, and decompose 3D surfaces into parts. We extend
such methods to 3D point clouds. Unlike images, we cannot use colors or textures as cues, and unlike most computer
graphics (and CAD) segmentation problems, the input is a
noisy point cloud representing a scene, rather than a clean
surface model of an individual object.
We propose a min-cut based segmentation method. Our
method works by creating a nearest neighbors graph on the
point cloud, defining a penalty function that encourages a
smooth segmentation where the foreground is weakly connected to the background, and minimizing that function
with a min-cut. The method was summarized as part of a
system of object detection for urban outdoor scenes in [6];
in this paper, we expand on that summary with a more detailed description of the algorithm and discussion of the design choices, examples, and an in-depth evaluation.
2. Previous Work
We summarize previous work in three related areas: segmentation of point clouds, part decomposition of 3D objects, and segmentation of images.
Point Cloud Segmentation. Some work has been done
on segmenting point clouds. In some scenarios, such as [3],
the input is a point cloud representing a single object, and
the goal is to decompose the object into patches. The algorithms proceed by either reconstructing a mesh and then
segmenting it, or by segmenting the point cloud directly.
While some work has been done on segmentation of point
clouds in scenes, the emphasis is usually on extracting geometric primitives (such as in [13] and [18]) using cues
(a) Input
(b) Graph
(e) Result
Figure 2. Overview of our system. (a) The system takes as input a point cloud near an object location (in this case, a short post). (b)
A k-nearest neighbors graph is constructed. (c) Each node has a background penalty function, increasing from the input location to the
background radius (visualized with color turning from green to red as the value increases). (d) In the automatic version of our algorithm,
a foreground point is chosen as a hard constraint (in the interactive mode, the user chooses hard foreground and background constraints).
The resulting segmentation is created via a min-cut (e).
3. Overview
Given a suspected location of an object, we need an algorithm that returns the foreground points that belong to the
object. In particular, we are interested in objects found in
cities that range in size from fire hydrants to traffic lights.
Desirable properties of a segmentation algorithm include:
Correctness: it would be best to get the foreground as
accurately as possible (with respect to precision/recall)
Input parameters: since over- and under-segmentation
is inevitable, it is helpful for an algorithm to accept additional intuitive parameters describing a-priori assumptions about the object (for example, the approximate
horizontal radius to the background). This was used
in the system of [6], for example, to request multiple
segmentations of an object for more robust feature extraction.
Speed: a segmentation algorithm is likely to be executed for many locations in a large point cloud, so running time is important.
The intuition behind our algorithm is that a good foreground segmentation consists of points that are wellconnected to each other, but poorly connected to the background. The overview of our method is shown in Figure 2. Given an input scene (Figure 2a), we build a nearestneighbor graph (Figure 2b) to encourage neighboring points
to be assigned the same label. Then, given as input an expected horizontal distance to the background, we create a
background penalty (Figure 2c) that encourages more distant points to be in the background. In an automatic or interactive manner, we add hard constraints for foreground and,
optionally, background points (Figure 2d). Our algorithm
returns the segmentation generated by the min-cut, which
(i) minimizes the cut cost of the nearest neighbor graph, (ii)
minimizes the background penalty, and (iii) adheres to the
hard foreground or background constraints (Figure 2e).
Section 4 describes the construction of the graph and
background penalty. Then, section 5 describes the addition
of hard constraints in the fully automatic regime of our algorithm (including a method of automatically choosing the
background radius), the addition of hard constraints in the
interactive regime, and accelerations for greater efficiency.
4. Basic Setup
The input to the segmentation algorithm is (i) a 2d location and (ii) a suspected background radius (horizontal
radius at which we assume the background begins). We
estimate the ground plane with iterative plane fitting, and
remove points close to the ground (within .2m). We then
construct a graph between neighboring points, which ensures smoothness of segmentation, and a soft background
penalty, which encourages points far from the object location to be labeled as background.
4.1. Graph
We create a graph representing the structure of the point
cloud, where closer points are more strongly connected.
Specifically, we construct a k-nearest neighbors graph on
the input points (we use k = 4). The edges of this graph
have weights that decrease with distance, so that if edge
i connects points at a distance di , its weight is wi =
exp((di /)2 ), where we use = .1m (a common spacing of points in our data). Because the k-nearest neighbors graph often results in several disconnected components
within each foreground object, we connect the closest point
pairs of disconnected components.
Note that the cut cost of a potential segmentation of this
graph takes into account both distances between points in a
foreground/background cut, and the density of points on the
boundary via the number of broken link edges. This makes
the min-cut algorithm more robust to spurious connections
between foreground and background. Note also that the
construction of the graph makes it adaptive to the point
cloud resolution, without requiring a pre-defined threshold.
When the min-cut is computed, this graph ensures that the
segmentation is smooth (neighboring points are more likely
to be assigned to the same segment) and that a larger separation between foreground and background is encouraged.
x 10
1
0
0.2
0.4
0.6
0.8
0.8
0.04
0.02
0
0.2
0.4
0.6
5. Performing Segmentation
The previous section described the how the graph is
set up to encourage two properties with soft constraints:
a smoothness error that encourages nearby points to have
the same label, and a background penalty that encourages
points close to the background radius to be in the background. It remains to specify constraints that encourage
points to be in the foreground. Our algorithm can be run in
two regimes: automatically, and interactively, both adding
hard constraints. In both cases, the final segmentation is
found with a min-cut. Below, we describe the automatic
regime (including automatically choosing the background
(a) R = 2m
(b) R = 4m
(Figure 5c, d). Note that the nearest neighbors graph ensures that each successive segmentation is a smooth extrapolation of the constraints.
5.3. Accelerations
While no automatic algorithm will be completely successful, in some scenarios it may be practical to use an
interactive segmentation tool. Such a tool should follow
the user-constraints at interactive rates, while automatically
making a reasonable guess in unconstrained regions, and
allowing any segmentation to be reached with sufficiently
many constraints. Similar to the ideas of [1] our min-cut
algorithm is easily set up for such a tool.
The interactive algorithm starts with the graph and background weights given in previous section. Instead of assuming a foreground constraint, as in the automatic algorithm,
we allow the user to iteratively add (and remove) points as
hard background or foreground constraints. The segmentation is re-calculated as the min-cut under these constraints.
The interactive tool is shown in Figure 5. To create a
segmentation, the user looks at the input scene (Figure 5a),
and selects a radius that includes the object to segment (Figure 5b). Note that for an object such as the shown newspaper box, automatic segmentation is very difficult since the
box is connected to adjacent newspaper boxes. To perform
manual segmentation, the user adds foreground and background constraints as necessary, responding to the interactively generated segmentation until the result is satisfactory
(a) Input
(c) User adds foreground constraints (d) Result after additional constraints
Figure 5. The user surveys the input scene (a), and chooses a radius that includes the object to segment (b). The user creates several
foreground constraints (green circles), and a segmentation is interactively performed, with foreground point shown in blue (c). If necessary,
the user adds additional constraints (background constraints in red), until the segmentation is satisfactory (d). The user has the option of
toggling between views of all points (as shown here), or only background or foreground points, to make sure the object is not over or under
segmented.
6. Results
In this section, we first describe the data used for testing
our prototype system. We then describe two alternative segmentation algorithms, and show some example results and
comparisons. Finally, we perform a quantitative evaluation
of our algorithm.
6.1. Data
We tested our segmentation algorithm on the LIDAR
scan and dataset described in [6], which covers about 6
square kilometers of Ottawa, Canada. The truthed part of
that scan covers about 300,000 square meters with about
100 million points, and contains about 1000 objects of interest placed by BAE Systems. These object of interest form
the basis of our quantitative evaluation.
The scans were collected by Neptec. They were collected from four car-mounted TITAN scanners facing left,
right, forward-up, and forward-down, and from an airborne
scanner. The scans were merged and provided to us as a
single point cloud, with a position, intensity, and color per
point. The colors from car-mounted scanners were not very
accurate, so we focused on geometry as the only cue for
segmentations in this paper. The reported alignment error
between ground-based and air-based scans is 0.05 meters,
with a reported vertical accuracy is 0.04 meters.
6.3. Examples
In this section, we show several example segmentations
created with our method as well as with alternatives. While
a more complete, quantitative comparison is performed in
the next section, these examples provide useful intuition.
Figure 6 contains segmentations of several objects, including a car, several lamp posts, a sign, and a trash can.
The first column has the ground truth segmentation created with our interactive tool. The next column has the all
Ground
Truth
All Points
(r = 2)
Con Comp
(r = 2; s = .08)
Con Comp
(r = 2; s = .1)
Min Cut
(r = 2)
Min Cut
(r = 4)
Min Cut
(automatic r)
Car
Lamppost
Lamppost
Lamppost
Sign
Trashcan
Figure 6. Example segmentations. Each row has an object with ground truth segmentation, followed by an all-points segmentation, two
connected component segmentations with different spacings, two min-cut segmentations with different static background radii, and a mincut segmentation with automatically chosen background radius. While connected components and min-cut with static background radii are
sometimes successful, the min-cut segmentation with automatic background radius is more robust to clutter and varying object sizes.
Recall
(a)
(b)
(c)
Figure 7. Example failures of the automatic segmentation algorithm. In (a) the control box (on the right) cannot be separated
from a close light standard (on the left). In (b), only a part of a car
is returned. In (c), a lamp post is not separated from a close roof.
6.4. Evaluation
Using the ground truth segmentations we created with
our interactive segmentation tool, we are able to quantitatively evaluate the performance of our segmentation algorithm, and compare to the alternatives.
We gather statistics as follows. For each object of interest, we run a segmentation algorithm, and record its precision (ratio of correctly predicted foreground points to the
total number of predicted foreground points) and recall (ratio of correctly predicted foreground points to the number
of ground truth foreground points). A high precision indicates that most of the predicted foreground points are in the
object, and a high recall indicates that most of the object
points have been predicted to be in the foreground.
Table 1 contains the results, averaged first by object class
and then overall, for the segmentation algorithms shown in
Figure 6: all points, two settings of connected components,
two settings of min-cut with static radius, and min-cut with
an automatically chosen background radius. Some objects
are easier to segment: parking meters are often isolated, so
both connected component and min-cut algorithms perform
well. Other objects, such as trash cans, are often close to
background clutter, so the precision is lower. Min-cut algorithms are able to raise both precision and recall for trash
cans relative to connected components. Likewise, min-cut
algorithms improve performance significantly for cars and
signs. Other objects, such as newspaper boxes (an example
of one is shown in Figure 5) are very close to each other, so
while our algorithm improves the precision, it remains low.
Overall, as expected, the all points algorithm has a relatively high recall at the cost of low precision. The two
connected component algorithms have a higher precision,
and the min-cut algorithms improve on this performance.
0.9
All Points
Connected Components
MinCut
MinCut Automa!c Radius
0.8
0.7
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Precision
Figure 8. Precision-recall plots of our algorithm compared to several alternatives. The all points algorithm is shown in blue at
varying radii. The connected components algorithm is shown in
red with varying spacing. The min-cut algorithm is shown in
green with varying statically chosen background radii. Finally, the
min-cut algorithm with automatically chosen background radius
is shown in purple with varying cut cost thresholds. The performance improves in the order of the algorithms presented.
The min-cut algorithm with automatic radius has better performance that the two shown settings of the static radius
version. This last point is more apparent in Figure 8 (replicated from [6]), which shows the precision-recall curves resulting from running the above segmentation algorithms at
several settings. Specifically, it shows the all-points algorithm at varying radii (blue), connected components at varying spacing (red), min-cut with varying static background
(green), and min-cut with automatically chosen radius at
varying thresholds (purple). Comparing the last two curves
shows the improvement in performance made by automatically choosing the background radius.
7. Conclusion
In this paper, we presented a graph-cut based method for
segmenting objects in point clouds. We showed how our
method can be adapted for both automatic and interactive
segmentation. We used the interactive version of our algorithm to generate a truth set of about 1000 segmentations,
and used this truth set to quantitatively evaluate the automatic algorithm, comparing to two alternatives.
There are two immediate directions for future work.
First, we can augment our algorithm with more cues to
make it more effective. One can fit geometric primitives,
such as planes and cylinders to the data, and augment our
algorithm with the observation that since many urban objects are man-made, points belonging to the same primitive
are likely to belong to the same object. Similarly, one can
add cues such as convexity (object parts are likely to be convex) and symmetry (many objects exhibit strong symmetry,
so segmentations ought to be symmetric as well).
Second, our segmentation algorithm is likely to be a step
in recognition system, as shown in [6]. While our segmentations can inform recognition, feedback is likely to be useful
once an object type has been proposed, our algorithm is
simple to adjust to account for the prior shape of this type,
and a more accurate segmentation can be generated.
Class
Short Post
Car
Lamp Post
Sign
Light Standard
Traffic Light
Newspaper Box
Tall Post
Fire Hydrant
Trash Can
Parking Meters
Traffic Control Box
Recycle Bins
Advertising Cylinder
Mailing Box
A - frame
All
# in
Truth Area
338
238
146
96
58
42
37
34
20
19
10
7
7
6
3
2
1063
All Points
(r = 2)
Pr Re
13 99
77 75
60 99
36 100
68 93
58 75
13 100
35 100
36 100
17 100
14 100
19 100
46 100
70 100
48 100
59 100
43 93
Con Comp
(r = 2; s = .08)
Pr
Re
89
99
93
47
82
95
73
74
84
92
75
75
15
96
42
89
81
89
48
93
100
98
82
96
71
94
79
83
86
100
70
50
82
84
Con Comp
(r = 2; s = .1)
Pr
Re
86
99
91
59
79
97
68
97
83
92
72
75
14
100
42
96
81
95
43
94
100
99
79
99
64
99
79
83
86
100
69
100
79
88
Min-Cut
(r = 2m)
Pr Re
93 98
93 20
89 96
84 98
92 86
92 72
40 86
79 84
89 100
57 100
100 100
79 100
92 99
97 100
98 100
87 100
89 78
Min-Cut
(r = 4m)
Pr Re
82 99
92 82
86 99
73 100
91 92
84 87
21 100
46 100
82 100
54 100
100 100
68 100
80 100
89 100
98 100
69 100
81 95
Min-Cut
(auto r)
Pr Re
92 99
92 77
89 98
83 100
91 92
84 86
38 93
58 96
88 100
60 100
100 100
80 100
92 100
96 100
98 100
86 100
86 93
8. Acknowledgments
We thank Neptec, John Gilmore, and Wright State University for providing the 3D LIDAR data set. This work
started as part of the DARPAs URGENT program, and we
thank BAE Systems for including us in the project, especially Erik Sobel, Matt Antone, and Joel Douglas. Aleksey Boyko, Xiaobai Chen, Forrester Cole, Vladimir Kim,
and Yaron Lipman provided valuable ideas, and Kristin
and Kelly Hageman helped with ground truthing. Finally,
we thank NSF (CNFS-0406415, IIS-0612231, and CCF0702672) and Google for providing funding.
[9]
[10]
[11]
[12]
References
[1] Y. Boykov and G. Funka-Lea. Graph cuts and efficient n-d
image segmentation. IJCV, 70(2):109131, 2006.
[2] B. Chazelle, D. Dobkin, N. Shourhura, and A. Tal. Strategies for polyhedral surface decomposition: An experimental
study. Computational Geometry: Theory and Applications,
7(4-5):327342, 1997.
[3] J. Fransens and F. Van Reeth. Hierarchical pca decomposition of point clouds. In Proceedings of the Third International Symposium on 3D Data Processing, Visualization, and
Transmission, pages 591598, 2006.
[4] M. Garland, A. Willmott, and P. Heckbert. Hierarchical face
clustering on polygonal surfaces. In ACM Symposium on
Interactive 3D Graphics, pages 4958, 2001.
[5] N. Gelfand and L. Guibas. Shape segmentation using local
slippage analysis. In Symposium on Geometry Processing,
pages 214223, 2004.
[6] A. Golovinskiy, V. G. Kim, and T. Funkhouser. Shape-based
recognition of 3d point clouds in urban environments. ICCV,
September 2009.
[7] Z. Ji, L. Liuy, Z. Chen, and G. Wang. Easy mesh cutting. In
Eurographics, volume 25, 2006.
[8] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active
[13]
[14]
[15]
[16]
[17]
[18]
[19]