Weakly Supervised Semantic Segmentation For Large-Scale Point Cloud

Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud
∗
Yachao Zhang1 , Zonghao Li1 , Yuan Xie2 * , Yanyun Qu1 , Cuihua Li1 , Tao Mei3
1
School of Informatics, Xiamen University, Fujian, China
2
School of Computer Science and Technology, East China Normal University, Shanghai, China
3
JD AI Research, Beijing, China
{yachaozhang,zonghaoli}@stu.xmu.edu.cn, yxie@cs.ecnu.edu.cn, {yyqu,chli}@xmu.edu.cn, tmei@jd.com
arXiv:2212.04744v1 [cs.CV] 9 Dec 2022
Abstract
Existing methods for large-scale point cloud semantic seg-
mentation require expensive, tedious and error-prone manual
point-wise annotations. Intuitively, weakly supervised train-
ing is a direct solution to reduce the cost of labeling. How-
ever, for weakly supervised large-scale point cloud semantic
segmentation, too few annotations will inevitably lead to inef-
fective learning of network. We propose an effective weakly
supervised method containing two components to solve the
above problem. Firstly, we construct a pretext task, i.e., point
cloud colorization, with a self-supervised learning to trans-
fer the learned prior knowledge from a large amount of unla-
beled point cloud to a weakly supervised network. In this way, Figure 1: Visualize the results of semantic segmentation, the
the representation capability of the weakly supervised net-
work can be improved by the guidance from a heterogeneous
misclassification points are signed in red. From left to right,
task. Besides, to generate pseudo label for unlabeled data, the columns show the results with one labeled point, 1% la-
a sparse label propagation mechanism is proposed with the beled points, 10% labeled points and the ground truth.
help of generated class prototypes, which is used to measure
the classification confidence of unlabeled point. Our method
is evaluated on large-scale point cloud datasets with different in its infancy. Xu and Lee proposed a weakly supervised ap-
scenarios including indoor and outdoor. The experimental re- proach (Xu and Lee 2020) whose results are close to the pre-
sults show the large gain against existing weakly supervised vious fully supervised performance with 10× fewer labeled
and comparable results to fully supervised methods1 .
points. This approach only deal with an instance (ShapeNet-
Part) or a block (1m × 1m on S3DIS) of point cloud in a
small scale.
Introduction For large-scale point clouds, the network cannot effec-
3D scene understanding is required in numerous applica- tively learn feature representation with a few labeled points.
tions, in particular robotics, autonomous driving and virtual We consider the two strategies to solve this problem: 1)
reality. Large-scale point cloud semantic segmentation as a What prior knowledge can be transferred to the segmenta-
fundamental task attracts more and more attention. The suc- tion network for improving feature representation in weakly
cess of deep neural networks in point cloud semantic seg- supervised learning? 2) How are the labels of given points
mentation is attribute to their ability to scale up with more propagated to unlabeled points effectively?
well-labeled training data (Qi et al. 2017a,b; Zhang, Hua, Firstly, transfer learning provides a promising way and is
and Yeung 2019; Li, Chen, and Hee Lee 2018; Wang et al. relatively successful in 2D vision tasks, such as image classi-
2019; Wu, Qi, and Fuxin 2019; Yang et al. 2019; Hu et al. fication, segmentation, and so on. But point cloud tasks lack
2020). a well-annotated and category-extensive dataset like Ima-
Fully supervised point cloud semantic segmentation geNet (Deng et al. 2009) for pre-training. We consider to
methods need expensive, tedious and error-prone manual use self-supervised learning for knowledge transfer. How-
point-wise annotations. One direct solution is to achieve ef- ever, how to use enormous amounts of unlabeled point cloud
fective segmentation via weakly supervised learning with to generate labels by itself and learn a semantically related
annotating partial points or semantic category which is still representation for subsequent tasks is very challenging. Sec-
ondly, propagating pseudo labels to unlabeled points is a
* Corresponding
Author common method for weakly supervised and semi-supervised
1 task to learn effectively. Usually, this requires constructing
Code based on mindspore: https://github.com/dmcv-
ecnu/MindSpore ModelZoo/tree/main/WS3 MindSpore a fully-connected graph with all points. However, for large-
scale point clouds (∼ 106 points), the fully-connected graph Obviously, this point-wise annotation is labor-intensive and
is unachievable due to large GPU memory consumption. time-consuming.
To address the above difficulties, we present an effective Self-supervised learning on point cloud. Self-
weakly supervised method for large-scale point cloud se- supervised learning springs up in computer vision recently.
mantic segmentation. Firstly, we notice that points of the It aims to learn good representations from unlabeled visual
same semantic class have similar color distribution, and data, reducing or even eliminating the need for the costly
point cloud with color is essentially free. We choose point collection of manual labels (Newell and Deng 2020). Recent
cloud colorization as a pretext task for transfer learning. self-supervised learning makes great successes in feature
Specifically, we use color space transformation to construct representation which achieves comparable or outperforms
a self-supervised network for color space completion. We results produced by supervised pre-training (Bachman,
further introduce a local perceptual regularization term to Hjelm, and Buchwalter 2019; He et al. 2020).
enhance the local representation which is consistent with the Unlike 2D images, self-supervised learning in point cloud
goal of segmentation task. As a result, it can learn a prior- is rarely used. Previous works on unsupervised 3D represen-
based initialization distribution. After that, we use the pre- tation learning (Achlioptas et al. 2018; Gadelha, Wang, and
trained knowledge to fine-tune weakly semantic segmenta- Maji 2018; Hassani and Haley 2019; Li, Chen, and Hee Lee
tion network. Our learning scheme allows to transfer the 2018; Sauder and Sievers 2019; Yang et al. 2018) mainly
knowledge from the self-supervised task to the weakly su- focuse on representing an instance (ShapeNet (Chang et al.
pervised task and improves the effectiveness of feature rep- 2015)). It is difficult to directly apply the feature represen-
resentation. Moreover, we propose a sparse label propaga- tation learned on an instance to real-word large-scale point
tion method induced by class prototype which can gradually cloud tasks due to the large domain gaps (Xie et al. 2020).
propagate pseudo labels to unlabeled points. It is computa- PointContrast (Xie et al. 2020) concerns on contrastive
tional friendly that can be adapted to large-scale tasks. embeddings and proposed a pre-training task for 3D point
Our contributions can be summarized as follows: cloud understanding. The core of this method is that differ-
• A weakly supervised segmentation method is proposed ent views of point cloud should be mapped to similar em-
for large-scale point cloud which only needs very small beddings for matched points. It achieves promising results
number of labeled points. on fully-supervised downstream tasks. Xu and Lee (Xu and
Lee 2020) proposed a Siamese self-supervision branch by
• We adopt the heterogeneous transfer learning method and augmenting the training sample with a random in-plane rota-
construct a self-supervised pretext task by point cloud col- tion and flipping, and then made the original and augmented
orization. It can learn a prior distribution and be general- point-wise predictions be consistent. This method (Xu and
ized well to segmentation tasks. Lee 2020) treats self-supervision as a branch of multi-task
• We propose an efficient sparse label propagation method and only the current training samples are used. Therefore, it
which can propagate labels to unlabeled points. It expands cannot make full use of other massive point clouds to learn
the supervision information and has low computational representations with strong generalization.
complexity. Weakly Supervised Point Cloud Semantic Segmenta-
tion. There are few researches on weakly supervised point
• Extensive experimental results demonstrate that our cloud semantic segmentation. Following the weakly super-
method achieves the comparable to or even exceeds fully vised manner called incomplete supervision in (Zhou 2018),
supervised competitors. the recent work (Xu and Lee 2020) utilizes 10× fewer la-
beled points to achieve comparable performance to fully su-
Related Work pervised method in small scale part segmentation and small
blocks ( ∼ 103 points) semantic segmentation. MPRM (Wei
Semantic segmentation for large-scale point cloud. Point- et al. 2020) introduces a multi-path region mining module to
Net and PointNet++ (Qi et al. 2017a,b) are pioneering ap- generate pseudo point-level label from a classification net-
proaches for point clouds. While recent works have shown work. The classification network is trained by the subcloud-
promising results on small point clouds, most of them can- level weakly labels. For the scene-level as an input, the per-
not directly scale up to large-scale point clouds (∼ 106 formance will degrade. However, up to now, none of them
points) due to high computational and memory costs (Hu can be generalized to large-scale problem well.
et al. 2020).
SPG (Landrieu and Simonovsky 2018) processes the
large-scale point clouds through a graph convolution-based Proposed Method
method. FCPN (Rethage et al. 2018) preprocesses large- Overview
scale point clouds into voxels. However, both the graph par-
titioning and voxelization are computationally expensive. Our method aims to exploit the knowledge transfer and la-
Recently, RandLA-Net (Hu et al. 2020) provides an efficient bel propagation to solve the problem of unstable and poor
and lightweight neural architecture for large-scale point representation produced by the network under weakly su-
cloud semantic segmentation. The state-of-the-art methods pervised large-scale point cloud segmentation. We propose
mentioned above are fully supervised that require a large an effective weakly supervised large-scale method and de-
number of point clouds with dense point-wise annotation. pict the overview framework in Figure 2.
Figure 2: The framework of our method consists of three parts: i) Self-supervised pretext task learns a prior knowledge. ii)
The prior knowledge is used to fine-tune the weakly-supervised semantic segmentation network. iii) Sparse label propagation
generates pseudo label for unlabeled data to improves the effectiveness of weakly-supervised task.
We take point cloud colorization as a self-supervised pre- tual distance (Zhang, Isola, and Efros 2016; Larsson, Maire,
text task to learn a prior-based initialization distribution. A and Shakhnarovich 2017). Thus, we perform point cloud
local perceptual regularization is proposed to learn the con- colorization by a, b completion in this colorspace. Given
textual information. Then, we use pre-trained parameters of the lightness channel L, the network predicts the a and b
encoder to initialize the weakly-supervised network for im- color channels and the local Gaussian distribution for each
proving the effectiveness of feature presentation. point. Notably, the value in channel L is replicated to three
Furthermore, we leverage labeled points to directly su- times of each point to keep the same dimension as the in-
pervise the network and fine-tune network parameters. We put of segmentation task. Therefore,
s
the input point cloud
also introduce a non-parametric label propagation method X s = [x1 , x2 , ..., xN s ] ∈ RN ×6 consists of N s 3D points
for weakly supervised semantic segmentation. Some unla- with the xyz coordinates and three L. N s is the number of
beled points are assigned pseudo labels through the simi- points in one point cloud.
larity between class prototypes (Li et al. 2020; Qiu, Yao, Moreover, we implement RandLA-Net on the self-
and Mei 2017) and embeddings of unlabeled points. There- supervised task by modifying the final output layer. That is,
fore, more supervised information is introduced to improve the output of the network is a 6-dimension vector which con-
effectiveness of training. Considering of computational and tains the predicted â, b̂ and the corresponding local mean and
memory efficiency for large-scale point cloud, we choose variance.
RandLA-Net (Hu et al. 2020) as the backbone, which is an
efficient and lightweight neural architecture for large-scale Loss of Self-Supervised Task The loss is inherited from
point clouds semantic segmentation. In the following, we standard regression problem, which minimizes L1 error
describe the self-supervised pretext task and the sparse la- between the prediction and ground truth. Given a point
bel propagation method. cloud with coordinates and triplicate lightness values,
the self-supervised pretext task learns a mapping Ŷ =
Self-Supervised Pretext Task F(X s ; Θ), Ŷ = {â, b̂, µ̂a , σ̂ a , µ̂b , σ̂ b }, where â, b̂, µ̂ and
Colorization provides a powerful supervisory signal unlike σ̂ denote predicted a, b and corresponding local mean and
training from scratch in 2D vision task. Training data are variance, respectively. The loss of self-supervised task can
easy to collect, so any point cloud with color can be used be formulated as:
Ns
as training data. Due to the progress of point cloud acqui- 1 X
sition equipment, we have access to enormous amounts of Lab = (||ai − âi ||1 + ||bi − b̂i ||1 ). (1)
2N s i=1
unlabeled point cloud data with color information. We in-
vestigate and implement self-supervised learning on point In addition, to learn the local color distribution of every
cloud colorization which is treated as a pretext task. Point point, we introduce a local perceptual regularization term.
colorization aims to guide the self-supervised model to learn If the network can predict the color distribution (mean and
the feature representation. variance) of the neighbors, it can embed the local informa-
It is recognized that the Lab color space favor for percep- tion which is consistent with the segmentation task using
Figure 3: The framework of sparse label propagation. is Hadamard product and Gather denotes the operator of getting
elements by index.
local features for weakly supervised semantic segmentation. Sparse Label Propagation
Given a point xi as the centroid, the local neighbor N (xi ) is Segmentation performance degrades significantly with few
calculated by KNN according to the Euclidean distance. The labeled points. The main reason is that supervisory infor-
ground truth µai and σia of a channel can be obtained by: mation provided by few labeled points can not be propa-
K
gated well to unlabeled points. Therefore, we use the labeled
1 X points to assign pseudo labels for unlabeled points, and fur-
µai = aj , ∀xj ∈ N (xi ), (2) ther provide additional supervised information to improve
K j=1
the representation of the weakly supervised network.
v In order to achieve this goal, the following items require
u K to be taken into account: 1) Computational complexity is
a
u1 X
σi = t (aj − µai )2 + ε, ∀xj ∈ N (xi ), (3) not high and memory recourse is not large. Large-scale point
K j=1 clouds usually contain ∼ 106 points, if using all the points as
nodes to construct a fully-connected graph, it will consume
where ε is a very small constant. µbi , σib can be obtained in a lot of memory and computing resources. 2) The anchor
the same way. We formulize the local perceptual regulariza- points should be sparse. Some ambiguous points should not
tion term as: be given labels to train the network. 3) The propagated label
N s should be soft. The propagated labels should be related to
1 X their similarity, and the higher of the similarity, the more
Llocal = (||µai − µ̂i a ||1 + ||σia − σ̂i a ||1 +
4N s i=1 (4) similar the label should be.
b b
We design a sparse label propagation method. The over-
||µbi − µ̂i ||1 + ||σib − σ̂i ||1 ). all framework is shown in Figure 3. It consists of three
parts: class prototype generation, class assignment matrix
The total loss Lp of self-supervised pretext task can be construction, sparse pseudo-label generation.
expressed as: Class prototype generation. In the last two lay-
Lp = Lab + Llocal . (5) ers of the network, we output embedding Z =
[z1l , z2l , ..., zM
l
; z1u , z2u , ..., zNu
] ∈ R(M +N )×d and the corre-
Discussion Why is the knowledge learned from the pretext
task beneficial for semantic segmentation? sponding prediction Y = [y1l , y2l , ..., yM l u
; y1u , y2u , ..., yN ] ∈
R(M +N )×C which comes from M labeled points and N un-
• Pretext task learns similar feature distributions as the se- labeled points. We use Zl , Zu to represent the embedding of
mantic segmentation task. Objects in the same category labeled and unlabeled points, respectively. Firstly we gener-
usually have similar color distribution, for example the ate C prototypes to represent the C classes according to the
vegetation is typically green, and the road is black. The labeled points. Specifically, we simply take the mean of the
surface color texture of the scene provides ample cues for labeled point embeddings Zl for each class. For class c, the
many categories. prototype ρc is given by:
• Pretext task embeds the local feature representation. We 1 X l
ρc = zi , (6)
introduce a local perceptual regularization term to con- |Ic | l
zi ∈Ic
strain the local color distribution to be consistent with the
original distribution. Thus, it allows the network to em- where Ic denotes the embedding sets of class c, c =
bed more local information. Therefore, it may enhance {1, 2, ..., C} for labeled points.
the embedding of local features for semantic segmenta- Class assignment matrix construction. We leverage em-
tion task. bedding Zu of unlabeled points and the class prototypes to
construct a similarity matrix W ∈ RN ×C by: In the early stage of training, the embedding is unreliable.
Class prototypes generated by embedding can not represent
||ziu − ρc ||2 classes well. Thus, the label propagation loss should not be
W = exp(− ), (7)
σ introduced to optimize network parameters. As the embed-
where σ is a hyper-parameter. Each column of W represents ding gets better, the weight of the pseudo loss should in-
the similarity between the unlabeled point and the class pro- crease. Therefore, we introduce a non-linear parameter λ to
totypes. We use sof t-max to convert the similarity into a balance the two losses. In this work, λ is formulated as:
class assignment probability S as: (
0, epoch < 30
exp(W) λ= epoch
−1
(13)
S = P(i 7→ c) = PC . (8) e max epoch , otherwise
c=1 exp(W)
where epoch and max epoch denote the current training
Sparse pseudo label generation. There are some points epoch and the total epochs, respectively.
with low similarity to each category. These points are not
suitable to provide supervisory information to train the net- Experiments and Analysis
work. Specifically, for each class, according to the class as- In this section, we firstly introduce the experimental settings.
signment matrix S, we select the top-K unlabeled points Then, we evaluate the weakly semantic segmentation perfor-
and get the mask M k ∈ {0, 1}N ×C , where mkic = 1 means mance on indoor and outdoor large-scale point clouds, re-
the embedding of i-th point is the first K points similar to spectively. Furthermore, we perform ablation study to eval-
the class c. N is the number of unlabeled points. This is a la- uate the importance of the main components.
bel expansion method with a balanced number of categories.
It can alleviate the category imbalance to a certain extent. Experiment Settings
As an unlabeled point may belong to multiple categories,
we choose the most similar category and generate a binary Dataset Setting for Self-supervised Task Previous works
mask. According to M k , we get the point mask M pt ∈ on unsupervised 3D representation learning (Achlioptas
et al. 2018; Gadelha, Wang, and Maji 2018; Hassani and
{0, 1}N . mpti = 1 denotes that the ith point is assigned
Haley 2019; Li, Chen, and Hee Lee 2018; Sauder and Siev-
a pseudo labels. We can get the sparse pseudo label Y p ∈
ers 2019; Yang et al. 2018) mainly focused on ShapeNet
RN ×C by:
(Chang et al. 2015) which is a dataset of single-object CAD
Y p = M pt S, (9)
point cloud models. However, the real-world large-scale
p
where denotes Hadamard product. Y is the form of soft point cloud usually contains multiple models of different
one-hot. The label propagation loss Lsp can be formulated categories. Pre-training on ShapeNet have poor scalability
as: because of a large domain gap. We choose ScanNet (Dai
N C
1 X pt
X et al. 2017) as the pre-training dataset which is a big real-
Lsp = − m y p log yic
u
, (10) world point cloud dataset containing 2.5 million views in
||M pt ||1 i=1 i c=1 ic
more than 1500 indoor scans.
u
where yic is the probability that the unlabeled point i is clas- To train self-supervised pretext task, we convert the RGB
sified as category c, and || · ||1 denotes L1 norm. space to the Lab space and split 6 channels corresponding
Compared with the traditional fully-connected graph la- to the coordinate (x, y, z) and color (L, a, b) into the given
bel propagation method, our method is computation effi- channels (x, y, z, L) and the prediction channels (a, b).
cient. The complexity of our method is O(N Cd), while the
method of fully-connected graph is O((N + M )2 d). C is Dataset Setting for Weakly Supervised Segmentation
the number of category and d denotes the dimension of Zu . In order to evaluate the performance of our network on
The magnitude of C is ∼ 101 , which is much smaller than weakly semantic segmentation tasks, we experiment on the
N (∼ 106 ). indoor scene dataset S3DIS (Armeni et al. 2016) and Scan-
Netv2 (Dai et al. 2017) and outdoor dataset Semantic3D
Loss of Weakly Supervised Task (Hackel et al. 2017). Note that our method is pre-trained on
The loss of weakly supervised semantic segmentation in- ScanNet dataset. However, the pre-training task does not use
cludes two terms: segmentation loss and label propagation the semantic labels.
loss. Implementation Details Here weakly supervised settings
Ltotal = Lseg + λLsp , (11) are studied. i) 1 point label (1pt), we assume there is only
We utilize a softmax cross-entropy loss on the labeled one point within each category labeled with ground-truth for
points. For labeled points, the segmentation loss is formu- each scene. ii) (x%) denote x percentage points with ground-
lated as: truth. We set x = {1, 10} in our experiments. iii) The cost
M C of labeling 1% points manually is much higher than 1pt. We
l
1 XX exp yic consider a cheap way: super-point (SP T ), which annotates
Lseg = − yic log PC , (12) a local area instead of one point. In all settings, the annotated
M i=1 c=1 c=1 exp yic
l
points are randomly selected.
where yic is the ground truth of labeled point i. M denotes Our pre-training and weakly semantic segmentation net-
the number of labeled points. work are built on the efficient backbone of RandLA-Net.
Setting Method Area5 6-fold Setting Method mIoU
PointNet [’17] 41 47.6 PointNet++ [’17] 33.9
DGCNN [’19] - 56.1 SPLATNet [’18] 39.3
Fully
RSNet [’18] 56.5 - PointCNN [’18] 45.8
Fully
PointCNN [’18] 57.3 65.4 KPConv [’19] 68.4
ShellNet [’19] - 66.8 MPRM (subcloud) [’20] 41.1
RandLA-Net [’20] 62.8* 70.0 Ours (SP T ) 49.0
Weakly
Π Model [’16] 44.3 - Ours (1%) 51.1
1pt (0.2%) MT [’17] 44.4 - Ours (10%) 52.0
Xu [’20] 44.5 -
0.2% Ours 56.4 - Table 2: Quantitative results on ScanNetv2 (mIoU %).
Baseline 40.4 -
1pt (0.03%)
Ours 45.8 -
1% Ours 61.8 65.9 method.
Π Model [’16] 46.3 - The qualitative results are shown in Figure 1 on S3DIS
MT [’17] 47.9 - dataset under 1pt, 1% and 10% settings. It can be seen that
10% at 1% setting, our method can correctly classify except for
Xu [’20] 48.0 -
Ours 64.0 68.1 challenging boundaries.
Table 1: Comparisons of performance on S3DIS (Armeni Evaluation on ScanNetv2

et al. 2016) (mIoU %). Area5 denotes evaluation on Area We compare our method with five fully supervised meth-
5. 6-fold is cross evaluation of 6 areas. The superscript * ods: PointNet++ (Qi et al. 2017b), SPLATNet (Su et al.
indicates the result evaluated by the official code. 2018), PointCNN (Li et al. 2018), KPConv (Thomas et al.
2019), and a weakly supervised method: MPRM (Wei et al.
2020). Table 2 shows the comparison results on test set.
The two networks can be trained built upon TensorFlow with From SP T to 10% setting, ours achieves significant gain
a single NVIDIA Titan RTX. We use Adam optimizer with and make a breakthrough margin compared with fully super-
default parameters. The number of neighboring points K is vised method PointCNN. At SP T setting, ours gains 7.9%
16. Both the self-supervised pretext task and weakly seman- against MPRM which is trained with subcloud-level seman-
tic segmentation network are trained 80 epochs. We calcu- tic labels (subcloud). We cannot directly and quantitatively
late the mean Intersect over Union (mIoU) to evaluate the evaluate the annotated labor of SP T and MPRM. Compared
performance. to MPRM, ours needs to draw an extra little region at ran-
dom. But it reduces the labor of dividing into subclouds and
Evaluation on S3DIS repeatedly annotating semantic category for each subcloud.
We conduct the comparison experiments with state-of-the- Evaluation on Semantic3D
art weakly supervised (Weakly) and fully supervised (Fully)
on the indoor detaset S3DIS. The former contains: Π Model For outdoor point cloud semantic segmentation, we compare
(Laine and Aila 2016), MT (Tarvainen and Valpola 2017) ours against five state-of-the-art fully supervised methods:
and Xu (Xu and Lee 2020). The latter includes: PointNet (Qi SnapNet (Boulch, Le Saux, and Audebert 2017), SEGCloud
et al. 2017a), DGCNN (Wang et al. 2019), RSNet (Huang, (Tchapmi et al. 2017), ShellNet (Zhang, Hua, and Yeung
Wang, and Neumann 2018), PointCNN (Li et al. 2018), 2019), KPConv (Thomas et al. 2019) and RandLA-Net (Hu
ShellNet (Zhang, Hua, and Yeung 2019), RandLA-Net (Hu et al. 2020). The results of online testing are summarized in
et al. 2020). Table 3.
The results are presented in Table 1. Overall, we observe Clearly, the more labeled points, the better the seman-
the consistent improvement in the performance of segmen- tic segmentation performance is. With 1% labeled points,
tation with more labeled points. Our 1pt denotes only one we achieve 72.6% mIoU and outperforms fully supervised
labeled point for each category in the entire rooms instead of ShellNet (Zhang, Hua, and Yeung 2019) by the gain of over
small blocks (e.g., 1×1 meter). Therefore, our 1pt represents 3.3%. With 10% labeled points, our method is close to the
that total labeled points are about 0.03%. While the total la- current state-of-the-art RandLA-Net. Therefore, our method
beled points of (Xu and Lee 2020) are about 0.2%. Ours still is also effective in outdoor dataset.
achieves better performance against Xu (Xu and Lee 2020)
at 1pt setting. Under 1% setting, we achieve comparable per- Ablation Study
formance against the fully supervised method, RandLA-Net, We conducted ablation studies to evaluate the importance
and even exceed the previously fully supervised methods by of the main components: self-supervised pre-training task
a large margin. At 10% setting, ours is overwhelmingly su- and sparse label propagation. Scratch denotes random ini-
perior to the previous weakly supervised methods and even tialization. Pre-training represents that using the parameters
achieves 1.2% improvement against RandLA-Net (Hu et al. of self-supervised pretext task for initialization. λ = 0 repre-
2020) on Area 5. These results show the effectiveness of our sent without sparse label propagation. λ = 1 and ‘nonlinear’
Setting Method mIoU OA 60 70
SnapNet [’17] 59.1 88.6
Validation mIoU
Validation mIoU
50 60
SEGCloud [’17] 61.3 88.1 40

50
Fully ShellNet [’19] 69.3 93.2 30

40
KPConv [’19] 74.6 92.9 Baseline (1%)

30
20 Ours (1%) 20
RandLA-Net [’20] 77.4 94.8 Baseline (1pt) Baseline (1%)
10 Ours (1pt) 10 Ours (1%)
Ours (1%) 72.6 93.7
Weakly 0 10 20 30 40
Epoch
50 60 70 0 10 20 30 40
Epoch
50 60 70
Ours (10%) 73.3 94.0
(a) On S3DIS (b) On Semantic3D
Table 3: Quantitative results on Semantic3D (reduced-8)
(Hackel et al. 2017). ’OA’ denotes overall accuracy. Figure 4: mIoU curve of our method and baseline on the
validation set.
Setting 1pt SP T 1% 10%
Scratch λ=0 40.4 55.8 58.6 60.6
λ=0 45.4 57.5 60.1 61.3
Pre-training λ=1 - - - -
nonlinear 45.8 58.2 61.8 64.0
Table 4: Comparisons of different components on S3DIS

Area 5 (Armeni et al. 2016) (mIoU %).
denote a constant 1 and a nonlinear way weighting the sparse

label propagation, respectively.
Self-supervised pre-training or training from scratch. Figure 5: Visualize the results of point cloud colorization.
From the comparison of the first and second rows of Ta-
ble 4, the pre-training makes the performance better, while
training from scratch drops in performance for 5.0%, 1.7%, (RandLA-Net: 267s and Ours: 280s) and total test time
1.5%, 0.7%. We hypothesize that our pre-training is effective (RandLA-Net: 115s and Ours: 116s) on Area-5 of S3DIS
because it can transfer knowledge learned from supervised dataset. As the sparse label propagation is introduced in the
tasks to subsequent tasks. This encourages the subsequent training phase, training time of per-epoch increases by 13s.
task to learn a more robust feature distribution. We can still But their test time are almost the same.
find that the fewer annotation points, the greater the effect We also visualize the results of colorization in Figure 5 on
of self-supervision. We infer that fewer labeled points can- S3DIS. Although our point cloud colorization focuses on the
not learn a good feature representation. Therefore, it needs representation learning, it can be seen that the colorization
more additional knowledge to promote feature representa- is close to real-world color.
tion learning.
Sparse label propagation. We further analyse the ef- Conclusion
fects of sparse label propagation. Comparing the second and
We present a weakly supervised semantic segmentation
fourth rows of Table 4, our nonlinear obtains 0.4%, 0.7%,
method for large-scale point cloud. With the help of a self-
1.7%, and 2.7% improvement over λ = 0, respectively. The
supervised pretext task and sparse label propagation, our
more labeled points, the better performance of the seman-
method significantly outperforms the weakly supervised and
tic segmentation achieves. Because more labeled points can
almost reaches the accuracy of fully-supervised methods on
learn better representations and produce accurate class pro-
three challenging large-scale point cloud dataset. The ex-
totypes. When λ is a constant 1, the network training col-
perimental results show self-supervised knowledge trans-
lapses. Because the initial training (the first 30 epochs) can-
fer is an effective way to improve weak supervised perfor-
not learn a good feature representation. It is unreasonable
mance and our method make a breakthrough margin against
to use these features for label propagation. In the process of
the compared weakly supervised method. Furthermore, the
feature learning gradually getting better, a larger weight is
more the labeled points, the better performance our method
gradually introduced to the loss term.
achieves, which is consistent with our expectation. In the
Effectiveness and efficiency. We conduct a pilot study
future, we expect future work to explore more effective self-
to understand the effectiveness and show the mIoU curve
supervised knowledge representation for weakly or semi-
of our method and baseline on validation set. From Figure
supervised point cloud tasks.
4, our method converges faster and has a higher stable value
which demonstrates our method is more effective for weakly
supervised large-scale point cloud segmentation. Acknowledgments
Like RandLA-Net (Hu et al. 2020), we feed the entire This work is supported by the National Natural Sci-
scene to the network. We list the training time of each epoch ence Foundation of China under Grant 61876161, Grant
61772524, the National Key Research and Development Larsson, G.; Maire, M.; and Shakhnarovich, G. 2017. Col-
Program of China No.2020AAA0108301, and Natural Sci- orization as a proxy task for visual understanding. In CVPR,
ence Foundation of Shanghai No.20ZR1417700. CAAl- 6874–6883.
Huawei Mind-Spore Open Fund. Li, J.; Chen, B. M.; and Hee Lee, G. 2018. So-net: Self-
organizing network for point cloud analysis. In CVPR,
References 9397–9406.
Achlioptas, P.; Diamanti, O.; Mitliagkas, I.; and Guibas, L. Li, X.; Yu, L.; Fu, C.-W.; Cohen-Or, D.; and Heng, P.-A.
2018. Learning representations and generative models for 2020. Unsupervised Detection of Distinctive Regions on 3D
3d point clouds. In ICML, 40–49. Shapes. ACM Transactions on Graphics 39(5): 1–14.
Armeni, I.; Sener, O.; Zamir, A. R.; Jiang, H.; Brilakis, I.; Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; and Chen, B. 2018.
Fischer, M.; and Savarese, S. 2016. 3d semantic parsing of Pointcnn: Convolution on x-transformed points. In NeurIPS,
large-scale indoor spaces. In CVPR, 1534–1543. 820–830.
Bachman, P.; Hjelm, R. D.; and Buchwalter, W. 2019. Newell, A.; and Deng, J. 2020. How useful is self-
Learning representations by maximizing mutual information supervised pretraining for visual tasks? In CVPR, 7345–
across views. In NeurIPS, 15535–15545. 7354.
Boulch, A.; Le Saux, B.; and Audebert, N. 2017. Unstruc- Qi, C. R.; Su, H.; Mo, K.; and Guibas, L. J. 2017a. Pointnet:
tured point cloud semantic labeling using deep segmentation Deep learning on point sets for 3d classification and segmen-
networks. 3DOR 2: 7. tation. In CVPR, 652–660.
Chang, A. X.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; Qi, C. R.; Yi, L.; Su, H.; and Guibas, L. J. 2017b. Point-
Huang, Q.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, net++: Deep hierarchical feature learning on point sets in a
H.; et al. 2015. Shapenet: An information-rich 3d model metric space. In NeurIPS, 5099–5108.
repository. arXiv preprint arXiv:1512.03012 .
Qiu, Z.; Yao, T.; and Mei, T. 2017. Learning spatio-temporal
Dai, A.; Chang, A. X.; Savva, M.; Halber, M.; Funkhouser, representation with pseudo-3d residual networks. In ICCV,
T.; and Nießner, M. 2017. Scannet: Richly-annotated 3d re- 5533–5541.
constructions of indoor scenes. In CVPR, 5828–5839.
Rethage, D.; Wald, J.; Sturm, J.; Navab, N.; and Tombari,
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei- F. 2018. Fully-convolutional point networks for large-scale
Fei, L. 2009. Imagenet: A large-scale hierarchical image point clouds. In ECCV, 596–611.
database. In CVPR, 248–255.
Sauder, J.; and Sievers, B. 2019. Self-supervised deep learn-
Gadelha, M.; Wang, R.; and Maji, S. 2018. Multiresolution ing on point clouds by reconstructing space. In NeurIPS,
tree networks for 3d point cloud processing. In ECCV, 103– 12962–12972.
118.
Su, H.; Jampani, V.; Sun, D.; Maji, S.; Kalogerakis, E.;
Hackel, T.; Savinov, N.; Ladicky, L.; Wegner, J. D.; Yang, M.-H.; and Kautz, J. 2018. Splatnet: Sparse lattice
Schindler, K.; and Pollefeys, M. 2017. SEMAN- networks for point cloud processing. In CVPR, 2530–2539.
TIC3D.NET: A new large-scale point cloud classification
Tarvainen, A.; and Valpola, H. 2017. Mean teachers are
benchmark. In ISPRS Annals of the Photogrammetry, Re-
better role models: Weight-averaged consistency targets im-
mote Sensing and Spatial Information Sciences, volume IV-
prove semi-supervised deep learning results. In NeurIPS,
1-W1, 91–98.
1195–1204.
Hassani, K.; and Haley, M. 2019. Unsupervised multi-task Tchapmi, L.; Choy, C.; Armeni, I.; Gwak, J.; and Savarese,
feature learning on point clouds. In CVPR, 8160–8171. S. 2017. Segcloud: Semantic segmentation of 3d point
He, K.; Fan, H.; Wu, Y.; Xie, S.; and Girshick, R. 2020. clouds. In International Conference on 3D Vision (3DV),
Momentum contrast for unsupervised visual representation 537–547.
learning. In CVPR, 9729–9738. Thomas, H.; Qi, C. R.; Deschaud, J.-E.; Marcotegui, B.;
Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Goulette, F.; and Guibas, L. J. 2019. Kpconv: Flexible and
Trigoni, N.; and Markham, A. 2020. RandLA-Net: Efficient deformable convolution for point clouds. In ICCV, 6411–
semantic segmentation of large-scale point clouds. In CVPR, 6420.
11108–11117. Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S. E.; Bronstein, M. M.;
Huang, Q.; Wang, W.; and Neumann, U. 2018. Recur- and Solomon, J. M. 2019. Dynamic graph cnn for learning
rent slice networks for 3d segmentation of point clouds. In on point clouds. Acm Transactions On Graphics 38(5): 1–
CVPR, 2626–2635. 12.
Laine, S.; and Aila, T. 2016. Temporal ensembling for semi- Wei, J.; Lin, G.; Yap, K.-H.; Hung, T.-Y.; and Xie, L. 2020.
supervised learning. In ICLR. Multi-path region mining for weakly supervised 3D seman-
Landrieu, L.; and Simonovsky, M. 2018. Large-scale point tic segmentation on point clouds. In CVPR, 4384–4393.
cloud semantic segmentation with superpoint graphs. In Wu, W.; Qi, Z.; and Fuxin, L. 2019. Pointconv: Deep convo-
CVPR, 4558–4567. lutional networks on 3d point clouds. In CVPR, 9621–9630.
Xie, S.; Gu, J.; Guo, D.; Qi, C. R.; Guibas, L. J.; and Litany,
O. 2020. PointContrast: Unsupervised pre-training for 3D
point cloud understanding. In ECCV.
Xu, X.; and Lee, G. H. 2020. Weakly supervised seman-
tic point cloud segmentation: towards 10x fewer labels. In
CVPR, 13706–13715.
Yang, J.; Zhang, Q.; Ni, B.; Li, L.; Liu, J.; Zhou, M.; and
Tian, Q. 2019. Modeling point clouds with self-attention
and gumbel subset sampling. In CVPR, 3323–3332.
Yang, Y.; Feng, C.; Shen, Y.; and Tian, D. 2018. Fold-
ingnet: Point cloud auto-encoder via deep grid deformation.
In CVPR, 206–215.
Zhang, R.; Isola, P.; and Efros, A. A. 2016. Colorful image
colorization. In ECCV, 649–666.
Zhang, Z.; Hua, B.-S.; and Yeung, S.-K. 2019. Shellnet: Ef-
ficient point cloud convolutional neural networks using con-
centric shells statistics. In ICCV, 1607–1616.
Zhou, Z.-H. 2018. A brief introduction to weakly supervised
learning. National Science Review 5(1): 44–53.

Weakly Supervised Semantic Segmentation For Large-Scale Point Cloud

Uploaded by

Copyright:

Available Formats

Weakly Supervised Semantic Segmentation For Large-Scale Point Cloud

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Weakly Supervised Semantic Segmentation For Large-Scale Point Cloud

Uploaded by

Copyright:

Available Formats

Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud

Table 1: Comparisons of performance on S3DIS (Armeni Evaluation on ScanNetv2

SnapNet [’17] 59.1 88.6

SEGCloud [’17] 61.3 88.1 40

Fully ShellNet [’19] 69.3 93.2 30

KPConv [’19] 74.6 92.9 Baseline (1%)

Table 4: Comparisons of different components on S3DIS

denote a constant 1 and a nonlinear way weighting the sparse

You might also like