Scene segmentation with dag-recurrent neural networks

B Shuai, Z Zuo, B Wang, G Wang - IEEE transactions on pattern …, 2017 - ieeexplore.ieee.org
IEEE transactions on pattern analysis and machine intelligence, 2017ieeexplore.ieee.org
In this paper, we address the challenging task of scene segmentation. In order to capture the
rich contextual dependencies over image regions, we propose Directed Acyclic Graph-
Recurrent Neural Networks (DAG-RNN) to perform context aggregation over locally
connected feature maps. More specifically, DAG-RNN is placed on top of pre-trained CNN
(feature extractor) to embed context into local features so that their representative capability
can be enhanced. In comparison with plain CNN (as in Fully Convolutional Networks-FCN) …
In this paper, we address the challenging task of scene segmentation. In order to capture the rich contextual dependencies over image regions, we propose Directed Acyclic Graph-Recurrent Neural Networks (DAG-RNN) to perform context aggregation over locally connected feature maps. More specifically, DAG-RNN is placed on top of pre-trained CNN (feature extractor) to embed context into local features so that their representative capability can be enhanced. In comparison with plain CNN (as in Fully Convolutional Networks-FCN), DAG-RNN is empirically found to be significantly more effective at aggregating context. Therefore, DAG-RNN demonstrates noticeably performance superiority over FCNs on scene segmentation. Besides, DAG-RNN entails dramatically less parameters as well as demands fewer computation operations, which makes DAG-RNN more favorable to be potentially applied on resource-constrained embedded devices. Meanwhile, the class occurrence frequencies are extremely imbalanced in scene segmentation, so we propose a novel class-weighted loss to train the segmentation network. The loss distributes reasonably higher attention weights to infrequent classes during network training, which is essential to boost their parsing performance. We evaluate our segmentation network on three challenging public scene segmentation benchmarks: Sift Flow, Pascal Context and COCO Stuff. On top of them, we achieve very impressive segmentation performance.
ieeexplore.ieee.org