Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A survey on instance segmentation: state of the art

  • Trends and Surveys
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

Object detection or localization is an incremental step in progression from coarse to fine digital image inference. It not only provides the classes of the image objects, but also provides the location of the image objects which have been classified. The location is given in the form of bounding boxes or centroids. Semantic segmentation gives fine inference by predicting labels for every pixel in the input image. Each pixel is labelled according to the object class within which it is enclosed. Furthering this evolution, instance segmentation gives different labels for separate instances of objects belonging to the same class. Hence, instance segmentation may be defined as the technique of simultaneously solving the problem of object detection as well as that of semantic segmentation. In this survey paper on instance segmentation, its background, issues, techniques, evolution, popular datasets, related work up to the state of the art and future scope have been discussed. The paper provides valuable information for those who want to do research in the field of instance segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Martinez-Gonzalez P, Garcia-Rodriguez J (2018) A survey on deep learning techniques for image and video semantic segmentation. Appl Soft Comput 70:41–65. https://doi.org/10.1016/j.asoc.2018.05.018

    Article  Google Scholar 

  2. Tang Y (2013) Deep learning using linear support vector machines. arXiv preprint arXiv:13060239

  3. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Article  Google Scholar 

  4. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  5. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    MATH  Google Scholar 

  6. Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26. https://doi.org/10.1016/j.neucom.2016.12.038

    Article  Google Scholar 

  7. Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/TPAMI.2016.2572683

    Article  Google Scholar 

  8. Kirsch RA, Cahn L, Ray C, Urban GH (1957) Experiments in processing pictorial information with a digital computer. In: Eastern joint computer conference, pp 221–229

  9. Earnest LD (1963) Machine reading of cursive script. In: IFIP congress, Amsterdam. pp 462–466

  10. Moore GA (1968) Automatic scanning and computer processes for the quantitative analysis of micrographs and equivalent subjects. In: Cheng GC (ed) Pictorial Pattern Recognition. Thompson, Washington DC, pp 275–326

    Google Scholar 

  11. Rumelhart DE, Hinton GE, McClelland JL (1986) A general framework for parallel distributed processing. Parallel distributed processing. Explor Microstruct Cogn 1:45–76

    Google Scholar 

  12. Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286. https://doi.org/10.1109/5.18626

    Article  Google Scholar 

  13. Nouboud F, Plamondon R (1990) On-line recognition of handprinted characters: survey and beta tests. Pattern Recogn 23(9):1031–1044. https://doi.org/10.1016/0031-3203(90)90111-W

    Article  Google Scholar 

  14. Mori S, Suen CY, Yamamoto K (1992) Historical review of OCR research and development. Proc IEEE 80(7):1029–1058. https://doi.org/10.1109/5.156468

    Article  Google Scholar 

  15. Bunke H, Wang PS-P (1994) HandBook of Character Recognition and Document Image Analysis. World Scientific, Singapore

  16. Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  17. O’Gorman L, Kasturi R (1995) Document Image Analysis. IEEE Computer Society Press, New York

    Google Scholar 

  18. Tang YY, Lee S-W, Suen CY (1996) Automatic document processing: a survey. Pattern Recogn 29(12):1931–1952. https://doi.org/10.1016/S0031-3203(96)00044-1

    Article  Google Scholar 

  19. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp 2278–2324

  20. Nagy G (2000) Twenty years of document image analysis in PAMI. IEEE Trans Pattern Anal Mach Intell 22(1):38–62. https://doi.org/10.1109/34.824820

    Article  Google Scholar 

  21. Ahmed P, Al-Ohali Y (2000) Arabic character recognition: progress and challenges. J King Saud Univ Comput Inf Sci 12:85–116. https://doi.org/10.1016/S1319-1578(00)80004-X

    Article  Google Scholar 

  22. Chen L, Hermans A, Papandreou G, Schroff F, Wang P, Adam H (2018) MaskLab: instance segmentation by refining object detection with semantic and direction features. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, 18–23 June 2018, pp 4013–4022. https://doi.org/10.1109/cvpr.2018.00422

  23. Dickinson SJ, Leonardis A, Schiele B, Tarr MJ (2009) Object categorization: computer and human vision perspectives. Cambridge University Press, Cambridge

    Book  Google Scholar 

  24. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  25. Gidaris S, Komodakis N (2015) Object detection via a multiregion and semantic segmentation-aware CNN model. In: ICCV

  26. Zhu X, Vondrick C, Fowlkes CC, Ramanan D (2016) Do we need more training data? Int J Comput Vis 119(1):76–92

    Article  MathSciNet  Google Scholar 

  27. Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318. https://doi.org/10.1007/s11263-019-01247-4

    Article  Google Scholar 

  28. Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision. IEEE, pp 1150–1157

  29. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005. IEEE, pp 886–893

  30. Sivic (2003) Zisserman Video Google: a text retrieval approach to object matching in videos. In: Proceedings ninth IEEE international conference on computer vision, 13–16 Oct 2003, vol 1472, pp 1470–1477. https://doi.org/10.1109/iccv.2003.1238663

  31. Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European conference on computer vision. Springer, pp 143–156

  32. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828. https://doi.org/10.1109/TPAMI.2013.50

    Article  Google Scholar 

  33. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. pp 1097–1105

  34. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR

  35. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778

  36. Huang G, Liu Z, Maaten Lvd, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 2261–2269. https://doi.org/10.1109/cvpr.2017.243

  37. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  38. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: integrated recognition, localization and detection using convolutional networks. In: ICLR

  39. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833

  40. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariance shift. In: ICML, pp 448–456

  41. Girshick R (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision (ICCV), 7–13 Dec 2015, pp 1440–1448. https://doi.org/10.1109/iccv.2015.169

  42. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  43. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  44. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645. https://doi.org/10.1109/TPAMI.2009.167

    Article  Google Scholar 

  45. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824

    Article  Google Scholar 

  46. Hariharan B, Arbeláez P, Girshick R, Malik J (2017) Object instance segmentation and fine-grained localization using hypercolumns. IEEE Trans Pattern Anal Mach Intell 39(4):627–639. https://doi.org/10.1109/TPAMI.2016.2578328

    Article  Google Scholar 

  47. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR

  48. Shrivastava A, Sukthankar R, Malik J, Gupta A (2017) Beyond skip connections: top-down modulation for object detection. In: CVPR. arXiv:1612.06851

  49. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:151107122

  50. Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp 379–387

  51. Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  52. Zhou J, Cui G, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:181208434

  53. Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 936–944. https://doi.org/10.1109/cvpr.2017.106

  54. Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) RON: reverse connection with objectness prior networks for object detection. In: CVPR, pp 5936–5944

  55. Lenc K, Vedaldi (2015) A understanding image representations by measuring their equivariance and equivalence. In: CVPR, pp 991–999

  56. Liu L, Fieguth P, Guo Y, Wang X, Pietikäinen M (2017) Local binary features for texture classification: taxonomy and experimental study. Pattern Recogn 62:135–160. https://doi.org/10.1016/j.patcog.2016.08.032

    Article  Google Scholar 

  57. Chellappa R (2016) The changing fortunes of pattern recognition and computer vision. Image Vis Comput 55:3–5. https://doi.org/10.1016/j.imavis.2016.04.005

    Article  Google Scholar 

  58. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: ICCV

  59. Mordan T, Thome N, Henaff G, Cord M (2019) End-to-end learning of latent deformable part-based representations for object detection. Int J Comput Vis 127(11):1659–1679. https://doi.org/10.1007/s11263-018-1109-z

    Article  Google Scholar 

  60. Ouyang W, Wang X (2013) Joint deep learning for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 2056–2063

  61. Wang X, Shrivastava A, Gupta A (2017) A-fast-RCNN: hard positive generation via adversary for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 3039–3048. https://doi.org/10.1109/cvpr.2017.324

  62. Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in CNNs. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, 18–23 June 2018, pp 6995–7003. https://doi.org/10.1109/cvpr.2018.00731

  63. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  64. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick L (2014) Microsoft COCO: common objects in context

  65. Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2009) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88:303–308

    Article  Google Scholar 

  66. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  67. He K, Gkioxari G, Dollar P, Girshick R (2018) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2018.2844175

    Article  Google Scholar 

  68. Li Y, Qi H, Dai J, Ji X, Wei Y (2017) Fully convolutional instance-aware semantic segmentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 4438–4446. https://doi.org/10.1109/cvpr.2017.472

  69. Bai M, Urtasun R (2017) Deep watershed transform for instance segmentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 2858–2866. https://doi.org/10.1109/cvpr.2017.305

  70. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), 27–30 June 2016, pp 3213–3223. https://doi.org/10.1109/cvpr.2016.350

  71. Neuhold G, Ollmann T, Bulò SR, Kontschieder P (2017) The Mapillary vistas dataset for semantic understanding of street scenes. In: 2017 IEEE international conference on computer vision (ICCV), 22–29 Oct 2017, pp 5000–5009. https://doi.org/10.1109/iccv.2017.534

  72. Zagoruyko S, Lerer A, Lin T-Y, Pinheiro PO, Gross S, Chintala S, Dollár P (2016) A multipath network for object detection. arXiv preprint arXiv:160402135

  73. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 5987–5995. https://doi.org/10.1109/cvpr.2017.634

  74. Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J (2017) Dual path networks. In: Advances in neural information processing systems, pp 4467–4475

  75. Hariharan B, Arbeláez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. In: European conference on computer vision. Springer, pp 297–312

  76. Sande KEAVD, Uijlings JRR, Gevers T, Smeulders AWM (2011) Segmentation as selective search for object recognition. In: 2011 international conference on computer vision, 6–13 Nov 2011, pp 1879–1886. https://doi.org/10.1109/iccv.2011.6126456

  77. Arbeláez P, Pont-Tuset J, Barron J, Marques F, Malik J (2014) Multiscale combinatorial grouping. In: 2014 IEEE conference on computer vision and pattern recognition, 23–28 June 2014, pp 328–335. https://doi.org/10.1109/cvpr.2014.49

  78. Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), 27–30 June 2016, pp 3150–3158. https://doi.org/10.1109/cvpr.2016.343

  79. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768

  80. Peng C, Xiao T, Li Z, Jiang Y, Zhang X, Jia K, Yu G, Sun J (2018) MegDet: a large mini-batch object detector. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, 18–23 June 2018, pp 6181–6189. https://doi.org/10.1109/cvpr.2018.00647

  81. Chen K, Pang J, Wang J, Xiong Y, Li X, Sun S, Feng W, Liu Z, Shi J, Ouyang W (2019) Hybrid task cascade for instance segmentation. arXiv preprint arXiv:190107518

  82. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  83. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 6517–6525. https://doi.org/10.1109/cvpr.2017.690

  84. Lin T, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: 2017 IEEE international conference on computer vision (ICCV), 22–29 Oct 2017, pp 2999–3007. https://doi.org/10.1109/iccv.2017.324

  85. Kirillov A, Levinkov E, Andres B, Savchynskyy B, Rother C (2017) InstanceCut: from edges to instances with multicut. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 7322–7331. https://doi.org/10.1109/cvpr.2017.774

  86. Arnab A, Torr PHS (2017) Pixelwise instance segmentation with a dynamically instantiated network. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 879–888. https://doi.org/10.1109/cvpr.2017.100

  87. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:14127062

  88. Pinheiro PO, Collobert R, Dollar P (2015) Learning to segment object candidates 1990–1998

  89. Pinheiro PO, Lin T-Y, Collobert R, Dollár P (2016) Learning to refine object segments. In: European conference on computer vision, 2016. Springer, pp 75–91

  90. Dai J, He K, Li Y, Ren S, Sun J (2016) Instance-sensitive fully convolutional networks. In: European conference on computer vision. Springer, pp 534–549

  91. Chen X, Girshick R, He K, Dollár P (2019) TensorMask: a foundation for dense object segmentation. arXiv preprint arXiv:190312174

  92. Hariharan B, Arbelaez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: CVPR

  93. Bell S, Zitnick CL, Bala K, Girshick RB (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: CVPR

  94. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Object detectors emerge in deep scene CNNs. In: ICLR

  95. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929

  96. Uhrig J, Cordts M, Franke U, Brox T (2016) Pixel-level encoding and depth layering for instance-level semantic labeling. arXiv:1604.05096

  97. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI

  98. Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV

  99. Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2017) Understanding convolution for semantic segmentation. arXiv:1702.08502

  100. Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587

  101. Abadi M, Agarwal A (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467

  102. Dabov K, Foi A, Katkovnik V, Egiazarian K (2007) Image denoising by sparse 3-d transform-domain collaborative filtering. Trans Image Process (TIP) 16:2080–2095

    Article  MathSciNet  Google Scholar 

  103. Burger HC, Schuler CJ, Harmeling S (2012) Image denoising: can plain neural networks compete with BM3D? In: Computer vision and pattern recognition (CVPR)

  104. Burger HC, Schuler CJ, Harmeling S (2012) Image denoising with multi-layer perceptrons, part 2: training trade-offs and analysis of their mechanisms. arXiv:1211.1552

  105. Lefkimmiatis S (2017) Non-local color image denoising with convolutional neural networks. In: Computer vision and pattern recognition (CVPR)

  106. Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International conference on machine learning (ICML)

  107. Krahenbuhl P, Koltun V (2011) Efficient inference in fully connected crfs with gaussian edge potentials. In: Neural information processing systems (NIPS)

  108. Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH (2015) Conditional random fields as recurrent neural networks. In: International conference on computer vision (ICCV)

  109. Schwing AG, Urtasun R (2015) Fully connected deep structured networks. arXiv:1503.02351

  110. Chandra S, Usunier N, Kokkinos I (2017) Dense and low-rank Gaussian CRFs using deep embeddings. In: International conference on computer vision (ICCV)

  111. Harley A, Derpanis K, Kokkinos I (2017) Segmentation-aware convolutional networks using local attention masks. In: International conference on computer vision (ICCV)

  112. Liu S, Mello SD, Gu J, Zhong G, Yang MH, Kautz J (2017) Learning affinity via spatial propagation networks. In: Neural information processing systems (NIPS)

  113. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, 18–23 June 2018, pp 7794–7803. https://doi.org/10.1109/cvpr.2018.00813

  114. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80

    Article  Google Scholar 

  115. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Neural information processing systems (NIPS)

  116. Buades A, Coll B, Morel JM (2005) A non-local algorithm for image denoising. In: Computer vision and pattern recognition (CVPR)

  117. Efros AA, Leung TK (1999) Texture synthesis by nonparametric sampling. In: International conference on computer vision (ICCV)

  118. Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters—improve semantic segmentation by global convolutional network. In: CVPR

  119. Ghiasi G, Fowlkes CC (2016) Laplacian reconstruction and refinement for semantic segmentation. In: ECCV

  120. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, 2015. Springer, pp 234–241

  121. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: ICCV

  122. Fu C, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD: deconvolutional single shot detector. arXiv:1701.06659

  123. Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: ECCV

  124. Zagoruyko S, Lerer A, Lin T, Pinheiro PHO, Gross S, Chintala S, Dollar P (2016) A multipath network for object detection. In: BMVC

  125. Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: CVPR

  126. Ren S, He K, Girshick RB, Zhang X, Sun J (2017) Object detection networks on convolutional feature maps. PAMI

  127. Zeng X, Ouyang W, Yan J, Li H, Xiao T, Wang K, Liu Y, Zhou Y, Yang B, Wang Z, Zhou H, Wang X (2016) Crafting GBD-net for object detection. arXiv:1610.02579

  128. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: CVPR

  129. Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better. arXiv:1506.04579

  130. Cao Y, Xu J, Lin S, Wei F, Hu H (2019) GCNet: non-local networks meet squeeze-excitation networks and beyond. arXiv:1904.11492v1

  131. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: IEEE conference on computer vision and pattern recognition

  132. Bolya D, Zhou C, Xiao F, Lee YJ (2019) YOLACT: real-time instance segmentation. arXiv preprint arXiv:190402689

  133. Huang Z, Huang L, Gong Y, Huang C, Wang X (2019) Mask scoring R-CNN. arXiv e-prints

  134. Wang X, Zhang R, Kong T, Li L, Shen C (2020) SOLOv2: dynamic, faster and stronger. arXiv preprint arXiv:200310152

  135. Chen H, Sun K, Tian Z, Shen C, Huang Y, Yan Y (2020) BlendMask: top-down meets bottom-up for instance segmentation. arXiv preprint arXiv:200100309

  136. Wang X, Kong T, Shen C, Jiang Y, Li L (2019) SOLO: segmenting objects by locations. arXiv preprint arXiv:191204488

  137. Lee Y, Park J (15 Nov 2019) CenterMask: real-time anchor-free instance segmentation. arXiv:1911.06667v1

  138. Xie E, Sun P, Song X, Wang W, Liu X, Liang D, Shen C, Luo P (2019) PolarMask: single shot instance segmentation with polar representation. arXiv:1909.13226v2

  139. Sun K, Xiao B, Liu D, Wang J (2019) Deep high resolution representation learning for hman pose estimation. In: CVPR

  140. Li J, Zhao J, Wei Y, Lang C, Li Y, Sim T, Yan S, Feng J (2017) Multi-human parsing in the wild. arXiv:1705.07206

  141. Zhao J, Li J, Cheng Y, Zhou L, Sim T, Yan S, Feng J (2018) Understanding humans in crowded scenes: deep nested adversarial learning and a new benchmark for multi-human parsing. arXiv:1804.03287v3

  142. Chen X, Mottaghi R, Liu X, Fidler S, Urtasun R, Yuille A (2014) Detect what you can: detecting and representing objects using holistic models and body parts. In: CVPR, pp 1971–1978

  143. Brabandere BD, Neven D, Gool LV (2017) Semantic instance segmentation with a discriminative loss function. arXiv:1708.02551v1

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdul Mueed Hafiz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hafiz, A.M., Bhat, G.M. A survey on instance segmentation: state of the art. Int J Multimed Info Retr 9, 171–189 (2020). https://doi.org/10.1007/s13735-020-00195-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-020-00195-x

Keywords