Abstract
The Bag-of-Words (BoW) model—commonly used for image classification—has two strong limitations: on one hand, visual words are lacking of explicit meanings, on the other hand, they are usually polysemous. This paper proposes to address these two limitations by introducing an intermediate representation based on the use of semantic attributes. Specifically, two different approaches are proposed. Both approaches consist in predicting a set of semantic attributes for the entire images as well as for local image regions, and in using these predictions to build the intermediate level features. Experiments on four challenging image databases (PASCAL VOC 2007, Scene-15, MSRCv2 and SUN-397) show that both approaches improve performance of the BoW model significantly. Moreover, their combination achieves the state-of-the-art results on several of these image databases.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bosch, A., Zisserman, A., & Munoz, X. (2006). Scene classification via pLSA. In ECCV.
Chang, C. C., & Lin, C. J. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Chapelle, O., Haffner, P., & Vapnik, V. (1999). Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5), 1055–1064.
Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proc. workshop on statistical learning in computer vision, at ECCV.
Delaitre, V., Laptev, I., & Sivic, J. (2010). Recognizing human actions in still images: a study of bag-of-features and part-based representations. In BMVC.
Deselaers, T., & Ferrari, V. (2011). Visual and semantic similarity in ImageNet. In CVPR.
Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 results. http://www.pascal-network.org/challenges/VOC/voc2007/.
Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In CVPR.
Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In CVPR.
Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In ICCV.
van Gemert, J., Veenman, C., Smeulders, A., & Geusebroek, J. M. (2010). Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7), 1271–1283.
Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Tech. rep. 7694. California Institute of Technology.
Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In ICCV.
Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proc. of uncertainty in artificial intelligence.
Ji, R., Yao, H., Sun, X., Zhong, B., & Gao, W. (2010). Towards semantic embedding in visual vocabulary. In CVPR.
Khan, F., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. In ICCV.
Kittler, J., Hatef, M., Duin, R., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239.
Kumar, N., Berg, A., Belhumeur, P., & Nayar, S. (2009). Attribute and simile classifiers for face verification. In ICCV.
Lampert, C., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In CVPR.
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In CVPR.
Leung, T., & Malik, J. (2001). Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, 43, 29–44.
Li, L., Su, H., Xing, E., & Fei-Fei, L. (2010a). Object bank: a high-level image representation for scene classification & semantic feature sparsification. In NIPS.
Li, L. J., Wang, C., Lim, Y., Blei, D., & Fei-Fei, L. (2010b). Building and using a semantivisual image hierarchy. In CVPR.
Liu, J., Yang, Y., & Shah, M. (2009). Learning semantic visual vocabularies using diffusion distance. In CVPR.
Moosmann, F., Triggs, B., & Jurie, F. (2007). Fast discriminative visual codebooks using randomized clustering forests. In NIPS.
Morioka, N., & Satoh, S. (2010). Building compact local pairwise codebook with joint feature space clustering. In ECCV.
Perronnin, F., Senchez, J., et al. (2010). Large-scale image categorization with explicit data embedding. In CVPR.
Rosch, E., Mervis, C., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8(3), 382–439.
Saghafi, B., Farahzadeh, E., Rajan, D., & Sluzek, A. (2010). Embedding visual words into concept space for action and scene recognition. In BMVC.
Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In ICCV.
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In ICCV.
Sivic, J., Russell, B., Zisserman, A., Freeman, W., & Efros, A. (2008). Unsupervised discovery of visual object class hierarchies. In CVPR.
Su, Y., & Jurie, F. (2011). Visual word disambiguation by semantic contexts.
Su, Y., Allan, M., & Jurie, F. (2010). Improving object classification using semantic attributes. In BMVC.
Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In ECCV.
Ullah, M., Parizi, S., & Laptev, I. (2010). Improving bag-of-features action recognition with non-local cues. In BMVC.
Vogel, J., & Schiele, B. (2007). Semantic modeling of natural scenes for content-based image retrieval. International Journal of Computer Vision, 72(2), 133–157.
Wang, G., Hoiem, D., & Forsyth, D. (2009). Learning image similarity from Flickr groups using stochastic intersection kernel machines. In ICCV.
Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned universal visual dictionary. In ICCV.
Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). Sun database: large-scale scene recognition from abbey to zoo. In CVPR.
Yang, J., Li, Y., Tian, Y., Duan, L., & Gao, W. (2009). Group-sensitive multiple kernel learning for object categorization. In ICCV.
Yuan, J., Wu, Y., & Yang, M. (2007). Discovery of collocation patterns: from visual words to visual phrases. In CVPR.
Zhang, Y., & Chen, T. (2009). Efficient kernels for identifying unbounded-order spatial features. In CVPR.
Zheng, Y., Zhao, M., Neo, S., Chua, T., & Tian, Q. (2008). Visual synset: towards a higher-level visual representation. In CVPR.
Zhou, X., Yu, K., Zhang, T., & Huang, T. (2010). Image classification using super-vector coding of local image descriptors. In ECCV.
Acknowledgement
This work was partly realized under the Quaero Programme, funded by OSEO, French State agency for innovation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Su, Y., Jurie, F. Improving Image Classification Using Semantic Attributes. Int J Comput Vis 100, 59–77 (2012). https://doi.org/10.1007/s11263-012-0529-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-012-0529-4