Abstract
Object detection or localization is an incremental step in progression from coarse to fine digital image inference. It not only provides the classes of the image objects, but also provides the location of the image objects which have been classified. The location is given in the form of bounding boxes or centroids. Semantic segmentation gives fine inference by predicting labels for every pixel in the input image. Each pixel is labelled according to the object class within which it is enclosed. Furthering this evolution, instance segmentation gives different labels for separate instances of objects belonging to the same class. Hence, instance segmentation may be defined as the technique of simultaneously solving the problem of object detection as well as that of semantic segmentation. In this survey paper on instance segmentation, its background, issues, techniques, evolution, popular datasets, related work up to the state of the art and future scope have been discussed. The paper provides valuable information for those who want to do research in the field of instance segmentation.
Similar content being viewed by others
References
Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Martinez-Gonzalez P, Garcia-Rodriguez J (2018) A survey on deep learning techniques for image and video semantic segmentation. Appl Soft Comput 70:41–65. https://doi.org/10.1016/j.asoc.2018.05.018
Tang Y (2013) Deep learning using linear support vector machines. arXiv preprint arXiv:13060239
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26. https://doi.org/10.1016/j.neucom.2016.12.038
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/TPAMI.2016.2572683
Kirsch RA, Cahn L, Ray C, Urban GH (1957) Experiments in processing pictorial information with a digital computer. In: Eastern joint computer conference, pp 221–229
Earnest LD (1963) Machine reading of cursive script. In: IFIP congress, Amsterdam. pp 462–466
Moore GA (1968) Automatic scanning and computer processes for the quantitative analysis of micrographs and equivalent subjects. In: Cheng GC (ed) Pictorial Pattern Recognition. Thompson, Washington DC, pp 275–326
Rumelhart DE, Hinton GE, McClelland JL (1986) A general framework for parallel distributed processing. Parallel distributed processing. Explor Microstruct Cogn 1:45–76
Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286. https://doi.org/10.1109/5.18626
Nouboud F, Plamondon R (1990) On-line recognition of handprinted characters: survey and beta tests. Pattern Recogn 23(9):1031–1044. https://doi.org/10.1016/0031-3203(90)90111-W
Mori S, Suen CY, Yamamoto K (1992) Historical review of OCR research and development. Proc IEEE 80(7):1029–1058. https://doi.org/10.1109/5.156468
Bunke H, Wang PS-P (1994) HandBook of Character Recognition and Document Image Analysis. World Scientific, Singapore
Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 20(3):273–297
O’Gorman L, Kasturi R (1995) Document Image Analysis. IEEE Computer Society Press, New York
Tang YY, Lee S-W, Suen CY (1996) Automatic document processing: a survey. Pattern Recogn 29(12):1931–1952. https://doi.org/10.1016/S0031-3203(96)00044-1
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp 2278–2324
Nagy G (2000) Twenty years of document image analysis in PAMI. IEEE Trans Pattern Anal Mach Intell 22(1):38–62. https://doi.org/10.1109/34.824820
Ahmed P, Al-Ohali Y (2000) Arabic character recognition: progress and challenges. J King Saud Univ Comput Inf Sci 12:85–116. https://doi.org/10.1016/S1319-1578(00)80004-X
Chen L, Hermans A, Papandreou G, Schroff F, Wang P, Adam H (2018) MaskLab: instance segmentation by refining object detection with semantic and direction features. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, 18–23 June 2018, pp 4013–4022. https://doi.org/10.1109/cvpr.2018.00422
Dickinson SJ, Leonardis A, Schiele B, Tarr MJ (2009) Object categorization: computer and human vision perspectives. Cambridge University Press, Cambridge
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Gidaris S, Komodakis N (2015) Object detection via a multiregion and semantic segmentation-aware CNN model. In: ICCV
Zhu X, Vondrick C, Fowlkes CC, Ramanan D (2016) Do we need more training data? Int J Comput Vis 119(1):76–92
Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318. https://doi.org/10.1007/s11263-019-01247-4
Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision. IEEE, pp 1150–1157
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005. IEEE, pp 886–893
Sivic (2003) Zisserman Video Google: a text retrieval approach to object matching in videos. In: Proceedings ninth IEEE international conference on computer vision, 13–16 Oct 2003, vol 1472, pp 1470–1477. https://doi.org/10.1109/iccv.2003.1238663
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European conference on computer vision. Springer, pp 143–156
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828. https://doi.org/10.1109/TPAMI.2013.50
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. pp 1097–1105
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778
Huang G, Liu Z, Maaten Lvd, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 2261–2269. https://doi.org/10.1109/cvpr.2017.243
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: integrated recognition, localization and detection using convolutional networks. In: ICLR
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariance shift. In: ICML, pp 448–456
Girshick R (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision (ICCV), 7–13 Dec 2015, pp 1440–1448. https://doi.org/10.1109/iccv.2015.169
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645. https://doi.org/10.1109/TPAMI.2009.167
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824
Hariharan B, Arbeláez P, Girshick R, Malik J (2017) Object instance segmentation and fine-grained localization using hypercolumns. IEEE Trans Pattern Anal Mach Intell 39(4):627–639. https://doi.org/10.1109/TPAMI.2016.2578328
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR
Shrivastava A, Sukthankar R, Malik J, Gupta A (2017) Beyond skip connections: top-down modulation for object detection. In: CVPR. arXiv:1612.06851
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:151107122
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp 379–387
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Zhou J, Cui G, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:181208434
Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 936–944. https://doi.org/10.1109/cvpr.2017.106
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) RON: reverse connection with objectness prior networks for object detection. In: CVPR, pp 5936–5944
Lenc K, Vedaldi (2015) A understanding image representations by measuring their equivariance and equivalence. In: CVPR, pp 991–999
Liu L, Fieguth P, Guo Y, Wang X, Pietikäinen M (2017) Local binary features for texture classification: taxonomy and experimental study. Pattern Recogn 62:135–160. https://doi.org/10.1016/j.patcog.2016.08.032
Chellappa R (2016) The changing fortunes of pattern recognition and computer vision. Image Vis Comput 55:3–5. https://doi.org/10.1016/j.imavis.2016.04.005
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: ICCV
Mordan T, Thome N, Henaff G, Cord M (2019) End-to-end learning of latent deformable part-based representations for object detection. Int J Comput Vis 127(11):1659–1679. https://doi.org/10.1007/s11263-018-1109-z
Ouyang W, Wang X (2013) Joint deep learning for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 2056–2063
Wang X, Shrivastava A, Gupta A (2017) A-fast-RCNN: hard positive generation via adversary for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 3039–3048. https://doi.org/10.1109/cvpr.2017.324
Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in CNNs. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, 18–23 June 2018, pp 6995–7003. https://doi.org/10.1109/cvpr.2018.00731
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick L (2014) Microsoft COCO: common objects in context
Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2009) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88:303–308
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
He K, Gkioxari G, Dollar P, Girshick R (2018) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2018.2844175
Li Y, Qi H, Dai J, Ji X, Wei Y (2017) Fully convolutional instance-aware semantic segmentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 4438–4446. https://doi.org/10.1109/cvpr.2017.472
Bai M, Urtasun R (2017) Deep watershed transform for instance segmentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 2858–2866. https://doi.org/10.1109/cvpr.2017.305
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), 27–30 June 2016, pp 3213–3223. https://doi.org/10.1109/cvpr.2016.350
Neuhold G, Ollmann T, Bulò SR, Kontschieder P (2017) The Mapillary vistas dataset for semantic understanding of street scenes. In: 2017 IEEE international conference on computer vision (ICCV), 22–29 Oct 2017, pp 5000–5009. https://doi.org/10.1109/iccv.2017.534
Zagoruyko S, Lerer A, Lin T-Y, Pinheiro PO, Gross S, Chintala S, Dollár P (2016) A multipath network for object detection. arXiv preprint arXiv:160402135
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 5987–5995. https://doi.org/10.1109/cvpr.2017.634
Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J (2017) Dual path networks. In: Advances in neural information processing systems, pp 4467–4475
Hariharan B, Arbeláez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. In: European conference on computer vision. Springer, pp 297–312
Sande KEAVD, Uijlings JRR, Gevers T, Smeulders AWM (2011) Segmentation as selective search for object recognition. In: 2011 international conference on computer vision, 6–13 Nov 2011, pp 1879–1886. https://doi.org/10.1109/iccv.2011.6126456
Arbeláez P, Pont-Tuset J, Barron J, Marques F, Malik J (2014) Multiscale combinatorial grouping. In: 2014 IEEE conference on computer vision and pattern recognition, 23–28 June 2014, pp 328–335. https://doi.org/10.1109/cvpr.2014.49
Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), 27–30 June 2016, pp 3150–3158. https://doi.org/10.1109/cvpr.2016.343
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
Peng C, Xiao T, Li Z, Jiang Y, Zhang X, Jia K, Yu G, Sun J (2018) MegDet: a large mini-batch object detector. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, 18–23 June 2018, pp 6181–6189. https://doi.org/10.1109/cvpr.2018.00647
Chen K, Pang J, Wang J, Xiong Y, Li X, Sun S, Feng W, Liu Z, Shi J, Ouyang W (2019) Hybrid task cascade for instance segmentation. arXiv preprint arXiv:190107518
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 6517–6525. https://doi.org/10.1109/cvpr.2017.690
Lin T, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: 2017 IEEE international conference on computer vision (ICCV), 22–29 Oct 2017, pp 2999–3007. https://doi.org/10.1109/iccv.2017.324
Kirillov A, Levinkov E, Andres B, Savchynskyy B, Rother C (2017) InstanceCut: from edges to instances with multicut. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 7322–7331. https://doi.org/10.1109/cvpr.2017.774
Arnab A, Torr PHS (2017) Pixelwise instance segmentation with a dynamically instantiated network. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 21–26 July 2017, pp 879–888. https://doi.org/10.1109/cvpr.2017.100
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:14127062
Pinheiro PO, Collobert R, Dollar P (2015) Learning to segment object candidates 1990–1998
Pinheiro PO, Lin T-Y, Collobert R, Dollár P (2016) Learning to refine object segments. In: European conference on computer vision, 2016. Springer, pp 75–91
Dai J, He K, Li Y, Ren S, Sun J (2016) Instance-sensitive fully convolutional networks. In: European conference on computer vision. Springer, pp 534–549
Chen X, Girshick R, He K, Dollár P (2019) TensorMask: a foundation for dense object segmentation. arXiv preprint arXiv:190312174
Hariharan B, Arbelaez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: CVPR
Bell S, Zitnick CL, Bala K, Girshick RB (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: CVPR
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Object detectors emerge in deep scene CNNs. In: ICLR
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929
Uhrig J, Cordts M, Franke U, Brox T (2016) Pixel-level encoding and depth layering for instance-level semantic labeling. arXiv:1604.05096
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV
Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2017) Understanding convolution for semantic segmentation. arXiv:1702.08502
Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587
Abadi M, Agarwal A (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467
Dabov K, Foi A, Katkovnik V, Egiazarian K (2007) Image denoising by sparse 3-d transform-domain collaborative filtering. Trans Image Process (TIP) 16:2080–2095
Burger HC, Schuler CJ, Harmeling S (2012) Image denoising: can plain neural networks compete with BM3D? In: Computer vision and pattern recognition (CVPR)
Burger HC, Schuler CJ, Harmeling S (2012) Image denoising with multi-layer perceptrons, part 2: training trade-offs and analysis of their mechanisms. arXiv:1211.1552
Lefkimmiatis S (2017) Non-local color image denoising with convolutional neural networks. In: Computer vision and pattern recognition (CVPR)
Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International conference on machine learning (ICML)
Krahenbuhl P, Koltun V (2011) Efficient inference in fully connected crfs with gaussian edge potentials. In: Neural information processing systems (NIPS)
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH (2015) Conditional random fields as recurrent neural networks. In: International conference on computer vision (ICCV)
Schwing AG, Urtasun R (2015) Fully connected deep structured networks. arXiv:1503.02351
Chandra S, Usunier N, Kokkinos I (2017) Dense and low-rank Gaussian CRFs using deep embeddings. In: International conference on computer vision (ICCV)
Harley A, Derpanis K, Kokkinos I (2017) Segmentation-aware convolutional networks using local attention masks. In: International conference on computer vision (ICCV)
Liu S, Mello SD, Gu J, Zhong G, Yang MH, Kautz J (2017) Learning affinity via spatial propagation networks. In: Neural information processing systems (NIPS)
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, 18–23 June 2018, pp 7794–7803. https://doi.org/10.1109/cvpr.2018.00813
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Neural information processing systems (NIPS)
Buades A, Coll B, Morel JM (2005) A non-local algorithm for image denoising. In: Computer vision and pattern recognition (CVPR)
Efros AA, Leung TK (1999) Texture synthesis by nonparametric sampling. In: International conference on computer vision (ICCV)
Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters—improve semantic segmentation by global convolutional network. In: CVPR
Ghiasi G, Fowlkes CC (2016) Laplacian reconstruction and refinement for semantic segmentation. In: ECCV
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, 2015. Springer, pp 234–241
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: ICCV
Fu C, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD: deconvolutional single shot detector. arXiv:1701.06659
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: ECCV
Zagoruyko S, Lerer A, Lin T, Pinheiro PHO, Gross S, Chintala S, Dollar P (2016) A multipath network for object detection. In: BMVC
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: CVPR
Ren S, He K, Girshick RB, Zhang X, Sun J (2017) Object detection networks on convolutional feature maps. PAMI
Zeng X, Ouyang W, Yan J, Li H, Xiao T, Wang K, Liu Y, Zhou Y, Yang B, Wang Z, Zhou H, Wang X (2016) Crafting GBD-net for object detection. arXiv:1610.02579
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: CVPR
Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better. arXiv:1506.04579
Cao Y, Xu J, Lin S, Wei F, Hu H (2019) GCNet: non-local networks meet squeeze-excitation networks and beyond. arXiv:1904.11492v1
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: IEEE conference on computer vision and pattern recognition
Bolya D, Zhou C, Xiao F, Lee YJ (2019) YOLACT: real-time instance segmentation. arXiv preprint arXiv:190402689
Huang Z, Huang L, Gong Y, Huang C, Wang X (2019) Mask scoring R-CNN. arXiv e-prints
Wang X, Zhang R, Kong T, Li L, Shen C (2020) SOLOv2: dynamic, faster and stronger. arXiv preprint arXiv:200310152
Chen H, Sun K, Tian Z, Shen C, Huang Y, Yan Y (2020) BlendMask: top-down meets bottom-up for instance segmentation. arXiv preprint arXiv:200100309
Wang X, Kong T, Shen C, Jiang Y, Li L (2019) SOLO: segmenting objects by locations. arXiv preprint arXiv:191204488
Lee Y, Park J (15 Nov 2019) CenterMask: real-time anchor-free instance segmentation. arXiv:1911.06667v1
Xie E, Sun P, Song X, Wang W, Liu X, Liang D, Shen C, Luo P (2019) PolarMask: single shot instance segmentation with polar representation. arXiv:1909.13226v2
Sun K, Xiao B, Liu D, Wang J (2019) Deep high resolution representation learning for hman pose estimation. In: CVPR
Li J, Zhao J, Wei Y, Lang C, Li Y, Sim T, Yan S, Feng J (2017) Multi-human parsing in the wild. arXiv:1705.07206
Zhao J, Li J, Cheng Y, Zhou L, Sim T, Yan S, Feng J (2018) Understanding humans in crowded scenes: deep nested adversarial learning and a new benchmark for multi-human parsing. arXiv:1804.03287v3
Chen X, Mottaghi R, Liu X, Fidler S, Urtasun R, Yuille A (2014) Detect what you can: detecting and representing objects using holistic models and body parts. In: CVPR, pp 1971–1978
Brabandere BD, Neven D, Gool LV (2017) Semantic instance segmentation with a discriminative loss function. arXiv:1708.02551v1
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hafiz, A.M., Bhat, G.M. A survey on instance segmentation: state of the art. Int J Multimed Info Retr 9, 171–189 (2020). https://doi.org/10.1007/s13735-020-00195-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-020-00195-x