Wider or deeper: Revisiting the resnet model for visual recognition

Z Wu, C Shen, A Van Den Hengel - Pattern recognition, 2019 - Elsevier
Pattern recognition, 2019Elsevier
The community has been going deeper and deeper in designing one cutting edge network
after another, yet some works are there suggesting that we may have gone too far in this
dimension. Some researchers unravelled a residual network into an exponentially wider
one, and assorted the success of residual networks to fusing a large amount of relatively
shallow models. Since some of their early claims are still not settled, we in this paper dig
more on this topic, ie, the unravelled view of residual networks. Based on that, we try to find …
Abstract
The community has been going deeper and deeper in designing one cutting edge network after another, yet some works are there suggesting that we may have gone too far in this dimension. Some researchers unravelled a residual network into an exponentially wider one, and assorted the success of residual networks to fusing a large amount of relatively shallow models. Since some of their early claims are still not settled, we in this paper dig more on this topic, i.e., the unravelled view of residual networks. Based on that, we try to find a good compromise between the depth and width. Afterwards, we walk through a typical pipeline of developing a deep-learning-based algorithm. We start from a group of relatively shallow networks, which perform as well or even better than the current (much deeper) state-of-the-art models on the ImageNet classification dataset. Then, we initialize fully convolutional networks (FCNs) using our pre-trained models, and tune them for semantic image segmentation. Results show that the proposed networks, as pre-trained features, can boost existing methods a lot. Even without exhausting the sophistical techniques to improve the classic FCN model, we achieve comparable results with the best performers on four widely-used datasets, i.e., Cityscapes, PASCAL VOC, ADE20k and PASCAL-Context. The code and pre-trained models are released for public access1.
Elsevier