Abstract
We study the problem of salient object subitizing, i.e. predicting the existence and the number of salient objects in an image using holistic cues. This task is inspired by the ability of people to quickly and accurately identify the number of items within the subitizing range (1–4). To this end, we present a salient object subitizing image dataset of about 14 K everyday images which are annotated using an online crowdsourcing marketplace. We show that using an end-to-end trained convolutional neural network (CNN) model, we achieve prediction accuracy comparable to human performance in identifying images with zero or one salient object. For images with multiple salient objects, our model also provides significantly better than chance performance without requiring any localization process. Moreover, we propose a method to improve the training of the CNN subitizing model by leveraging synthetic images. In experiments, we demonstrate the accuracy and generalizability of our CNN subitizing model and its applications in salient object detection and image retrieval.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
We use the subset of ImageNet images with bounding box annotations.
The F-score is computed as \(\frac{2RP}{(R+P)}\), where R and P denote recall and precision respectively.
When evaluated on the test set used by Zhang et al. (2015a), our best method GoogleNet_Syn_FT achieves a mAP score of 85.0%.
References
Achanta, R., Hemami, S., Estrada, F., & Susstrunk, S. (2009). Frequency-tuned salient region detection. In IEEE conference on computer vision and pattern recognition (CVPR).
Anoraganingrum, D. (1999). Cell segmentation with median filter and mathematical morphology operation. In International conference on image analysis and processing.
Arteta, C., Lempitsky, V., Noble, J. A., & Zisserman, A. (2014). Interactive object counting. In European conference on computer vision (ECCV).
Atkinson, J., Campbell, F. W., & Francis, M. R. (1976). The magic number \(4\pm 0\): A new look at visual numerosity judgements. Perception, 5(3), 327–34.
Berg, T. L., & Berg, A. C. (2009). Finding iconic images. In IEEE conference on computer vision and pattern recognition (CVPR) workshops.
Borji, A., Sihite, D. N., & Itti, L. (2012). Salient object detection: A benchmark. In European conference on computer vision (ECCV).
Boysen, S. T., & Capaldi, E. J. (2014). The development of numerical competence: Animal and human models. Hove: Psychology Press.
Chan, A. B., & Vasconcelos, N. (2009). Bayesian poisson regression for crowd counting. In IEEE international conference on computer vision (ICCV).
Chan, A. B., Liang, Z.-S. J., & Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking. In IEEE conference on computer vision and pattern recognition (CVPR).
Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In British Machine Vision Conference (BMVC).
Cheng, M.-M, Zhang, G.-X., Mitra, N. J., Huang, X., & Hu, S.-M. (2011). Global contrast based salient region detection. In IEEE conference on computer vision and pattern recognition (CVPR).
Cheng, M.-M., Mitra, N. J., Huang, X., Torr, P. H. S., & Hu, S.-M. (2015). Global contrast based salient region detection. IEEE Transaction on Pattern Analysis and Machine Intelligence, 37(3), 569–582.
Choi, J., Jung, C., Lee, J., & Kim, C. (2014). Determining the existence of objects in an image and its application to image thumbnailing. Signal Processing Letters, 21(8), 957–961.
Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y. (2009). NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM international conference on image and video retrieval.
Clements, D. H. (1999). Subitizing: What is it? Why teach it? Teaching Children Mathematics, 5, 400–405.
Davis, H., & Pérusse, R. (1988). Numerical competence in animals: Definitional issues, current evidence, and a new research agenda. Behavioral and Brain Sciences, 11(04), 561–579.
Dehaene, S. (2011). The number sense: How the mind creates mathematics. Oxford: Oxford University Press.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman J. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Feng, J., Wei, Y., Tao, L., Zhang, C., & Sun, J. (2011). Salient object detection by composition. In IEEE international conference on computer vision (ICCV).
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE conference on computer vision and pattern recognition (CVPR).
Gopalakrishnan, V., Hu, Y., Rajan, D. (2009). Random walks on graphs to model saliency in images. In IEEE conference on computer vision and pattern recognition (CVPR).
Gross, H. J. (2012). The magical number four: A biological, historical and mythological enigma. Communicative & Integrative Biology, 5(1), 1–2.
Gross, H. J., Pahl, M., Si, A., Zhu, H., Tautz, J., & Zhang, S. (2009). Number-based visual generalisation in the honeybee. PLoS ONE, 4(1), e4263.
Gurari, D., & Grauman, K. (2016). Visual question: Predicting if a crowd will agree on the answer. ArXiv preprint arXiv:1608.08188.
Heo, J.-P., Lin, Z., & Yoon, S.-E. (2014). Distance encoded product quantization. In IEEE conference on computer vision and pattern recognition (CVPR).
Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Synthetic data and artificial neural networks for natural scene text recognition. In Workshop on deep learning, NIPS.
Jansen, B. R. J., Hofman, A. D., Straatemeier, M., Bers, B. M. C. W., Raijmakers, M. E. J., & Maas, H. L. J. (2014). The role of pattern recognition in children’s exact enumeration of small numbers. British Journal of Developmental Psychology, 32(2), 178–194.
Jevons, W. S. (1871). The power of numerical discrimination. Nature, 3, 281–282.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM international conference on multimedia.
Kaufman, E. L., Lord, M. W., Reese, T. W., & Volkmann, J. (1949). The discrimination of visual number. The American Journal of Psychology, 62, 498–525.
Kazemzadeh, S., Ordonez, V., Matten, M., & Berg, T. L. (2014). Referitgame: Referring to objects in photographs of natural scenes. In Conference on empirical methods in natural language processing (EMNLP).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS).
Lee, Y. J., Ghosh, J., & Grauman, K. (2012). Discovering important people and objects for egocentric video summarization. In IEEE conference on computer vision and pattern recognition (CVPR).
Lempitsky, V., & Zisserman, A. (2010). Learning to count objects in images. In Advances in neural information processing systems (NIPS).
Li, X., Uricchio, T., Ballan, L., Bertini, M., Snoek, C. G. M., & Bimbo, A. D. (2016). Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Computing Surveys, 49(1), 14:1–14:39.
Li, Y., Hou, X., Koch, C., Rehg, J., & Yuille, A. (2014). The secrets of salient object segmentation. In IEEE conference on computer vision and pattern recognition (CVPR).
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft COCO: Common objects in context. In European conference on computer vision (ECCV).
Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., et al. (2011). Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2), 353–367.
Mandler, G., & Shebo, B. J. (1982). Subitizing: An analysis of its component processes. Journal of Experimental Psychology: General, 111(1), 1.
Nath, S. K., Palaniappan, K., & Bunyak, F. (2006). Cell segmentation using coupled level sets and graph-vertex coloring. In Medical image computing and computer-assisted intervention (MICCAI).
Pahl, M., Si, A., & Zhang, S. (2013). Numerical cognition in bees and other insects. Frontiers in psychology, 4, 162.
Peng, Xi., Sun, B., Ali, K., & Saenko, K. (2015). Learning deep object detectors from 3d models. In IEEE international conference on computer vision (ICCV).
Piazza, M., & Dehaene, S. (2004). From number neurons to mental arithmetic: The cognitive neuroscience of number sense. The Cognitive Neurosciences (3rd ed.), pp. 865–877.
Pinheiro, P. O., Lin, T.-Y, Collobert, R., & Dollr, P. (2016). Learning to refine object segments. In European conference on computer vision (ECCV).
Pont-Tuset, J., Arbelaez, P., Barron, J. T., Marques, F., & Malik, J. (2017). Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1), 128–140.
Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In IEEE conference on computer vision and pattern recognition (CVPR), DeepVision Workshop.
Russakovsky, O., Deng, J., Hao, S., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.
Scharfenberger, C., Waslander, S. L., Zelek, J. S., & Clausi, D. A. (2013). Existence detection of objects in images for robot vision using saliency histogram features. In IEEE international conference on computer and robot vision (CRV).
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2014). Overfeat: Integrated recognition, localization and detection using convolutional networks. In International conference on learning representations (ICLR).
Shen, X., & Wu, Y. (2012). A unified approach to salient object detection via low rank matrix recovery. In IEEE conference on computer vision and pattern recognition (CVPR).
Shin, D., He, Shu, Lee, G. M, Whinston, A. B., Cetintas, S., & Lee, K.-C. (2016). Content complexity, similarity, and consistency in social media: A deep learning approach. https://ssrn.com/abstract=2830377.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.
Siva, P., Russell, C., Xiang, T., & Agapito, L. (2013). Looking beyond the image: Unsupervised learning for object saliency and detection. In IEEE conference on computer vision and pattern recognition (CVPR).
Stark, M., Goesele, M., Schiele, B. (2010). Back to the future: Learning shape models from 3D CAD data. In British Machine Vision Conference (BMVC).
Stoianov, I., & Zorzi, M. (2012). Emergence of a visual number sense in hierarchical generative models. Nature Neuroscience, 15(2), 194–196.
Subburaman, V. B., Descamps, A., & Carincotte, C. (2012). Counting people in the crowd using a generic head detector. In IEEE international conference on advanced video and signal-based surveillance (AVSS).
Sun, B., & Saenko, K. (2014). From virtual to reality: Fast adaptation of virtual object detectors to real domains. In British Machine Vision Conference (BMVC).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In IEEE conference on computer vision and pattern recognition (CVPR).
Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A. (2003). Context-based vision system for place and object recognition. In IEEE international conference on computer vision (ICCV).
Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A limited-capacity preattentive stage in vision. Psychological Review, 101(1), 80.
Vedaldi, A., & Fulkerson, B. (2008). VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/.
Vuilleumier, P. O., & Rafal, R. D. (2000). A systematic study of visual extinction between-and within-field deficits of attention in hemispatial neglect. Brain, 123(6), 1263–1279.
Wang, P., Wang, J., Zeng, G., Feng, J., Zha, H., & Li, S. (2012). Salient object detection for searched web images via global saliency. In IEEE conference on computer vision and pattern recognition (CVPR).
Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In IEEE conference on computer vision and pattern recognition (CVPR).
Xiong, B., & Grauman, K. (2014). Detecting snap points in egocentric video with a web photo prior. In European conference on computer vision (ECCV). Springer.
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A. C., Salakhutdinov, R., et al. (2015). Show, attend and tell: Neural image caption generation with visual attention. In Internation conference on machine learning (ICML).
Zhang, J., Ma, S., Sameki, M., Sclaroff, S., Betke, M., Lin, Z., et al. (2015a). Salient object subitizing. In IEEE conference on computer vision and pattern recognition (CVPR).
Zhang, J., Sclaroff, S., Lin, Z., Shen, X., Price, B., & Mĕch, R. (2015b). Minimum barrier salient object detection at 80 fps. In IEEE international conference on computer vision (ICCV).
Zhang, J., Sclaroff, S., Lin, Z., Shen, X., Price, B., & Mĕch, R. (2016). Unconstrained salient object detection via proposal subset optimization. In IEEE conference on computer vision and pattern recognition (CVPR).
Zhao, R., Ouyang, W., Li, H., & Wang, X. (2015). Saliency detection by multi-context deep learning. In IEEE conference on computer vision and pattern recognition (CVPR).
Zou, W. Y., & McClelland, J. L. (2013). Progressive development of the number sense in a deep neural network. In Annual conference of the Cognitive Science Society (CogSci).
Acknowledgements
This research was supported in part by US NSF Grants 0910908 and 1029430, and gifts from Adobe and NVIDIA.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Antonio Torralba.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhang, J., Ma, S., Sameki, M. et al. Salient Object Subitizing. Int J Comput Vis 124, 169–186 (2017). https://doi.org/10.1007/s11263-017-1011-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-017-1011-0