Abstract
We propose a new class-specific image representation for image classification using multiple region detectors. The new representation is designed to solve the problem of increasing variation in object location and size within images of a class, for which traditional spatial pyramid matching shows limited classification accuracy. We propose a new region-division method that divides the image region into two class-specific regions, called class-specific region-of-interest (C-ROI) and focal region (FR). Using multiple region detectors and appropriate mixing of their responses avoids the problem of selecting a region detector that gives the best classification accuracy for a given image class, and thereby yields better results than using only one region detector. Several scale-invariant region detectors are used to obtain C-ROI and FR by considering their importance over a given image class. In experiments using several well-known datasets, the proposed method improved the accuracy and achieved results that were better than or comparable to those achieved by the related methods.
Similar content being viewed by others
Notes
We used only one basis because in experiments we observed that one basis is sufficient to represent the particular information of the given data.
In our experiments, the number of sub-regions has little effect on the classification performance.
References
Chai Y, Lempitsky V, Zisserman A (2011) Bicos: a bi-level co-segmentation method for image classification. In: Proceedings of the international conference on computer vision, pp 2579–2586
Cheng H, Wang R (2010) Semantic modeling of natural scenes based on contextual bayesian networks. Pattern Recognit 43(12):4042–4054
Cheng Y (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intell 17(8):790–799
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9(4):1871–1874
Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79
Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 264–271
Fergus R, Perona P, Zisserman A (2005) A sparse object category model for efficient learning and exhaustive recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 380–387
Fischler MA, Elschlager RA (1973) The representation and matching of pictorial structures. IEEE Trans Comput 22(1):67–92
Galleguillos C, Babenko B, Rabinovich A, Belongie SJ (2008) Weakly supervised object localization with stable segmentations. In: Proceedings of the European conference on computer vision, pp 193–207
Gao S, Cheng X, Chia LT (2010) Discovering class-specific informative patches and its application in landmark characterization. Proc Int Multimed Model Conf 5916:218–228
Jiang H, Wang J, Yuan Z, Wu Y, Zheng N, Li S (2013) Salient object detection: a discriminative regional feature integration approach. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 2083–2090
Kadir T, Brady M (2001) Scale, saliency and image description. Int J Comput Vis 45(2):83–105
Lampert CH, Blaschko MB, Hofmann T (2008) Beyond sliding windows: object localization by efficient subwindow search. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1–8
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2169–2178
Lee DD, Seung HS (1999) Learning the parts of objects using non-negative matrix factorization. Nature 401(6755):788–791
Li FF, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 524–531
Li Z, Liu J, Lu H (2010) Sparse constraint nearest neighbour selection in cross-media retrieval. In: ICIP, pp 1465–1468
Li Z, Liu J, Lu H (2013) Structure preserving non-negative matrix factorization for dimensionality reduction. Comput Vis Image Underst 117(9):1175–1189
Li Z, Liu J, Tang J, Lu H (2014) Projective matrix factorization with unified embedding for social image tagging. Comput Vis Image Underst 124:71–78
Li Z, Liu J, Yang Y, Zhou X, Lu H (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150
Li Z, Yang Y, Liu J, Zhou X, Lu H (2012) Unsupervised feature selection using nonnegative spectral analysis. In: Proceedings of the twenty-sixth AAAI conference on artificial intelligence, July 22–26, 2012, Toronto, Ontario, Canada
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Mikolajczyk K, Leibe B, Schiele B (2005) Local features for object class recognition. In: Proceedings of the international conference on computer vision, pp 1792–1799
Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis 60(1):63–86
Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool LV (2005) A comparison of affine region detectors. Int J Comput Vis 65(1–2):43–72
Nguyen M (2012) Segment-based svms for time series analysis. Ph.D. thesis, The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2161–2168
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. Proc Eur Conf Comput Vis 6314:143–156
Serrano N, Savakis A, Luo J (2004) Improved scene classification using efficient low-level features and semantic cues. Pattern Recognit 37(9):1773–1784
Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal estimated sub-gradient solver for svm. In: Proceedings of the international conference on machine learning, pp 807–814
Sharma G (2012) Discriminative spatial saliency for image classification. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 3506–3513
Vedaldi A, Fulkerson B (2010) Vlfeat: an open and portable library of computer vision algorithms. In: Proceedings of the international conference on Multimedia, pp 1469–1472
Vedaldi A, Zisserman A (2012) Efficient additive kernels via explicit feature maps. IEEE Trans Pattern Anal Mach Intell 34(3):480–492
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3360–3367
Yakhnenko O, Verbeek J, Schmid C (2011) Region-based image classification with a latent svm model. Tech Rep RR-7665, INRIA. http://hal.inria.fr/inria-00605344
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1794–1801
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, HJ., Hong, KS. Class-specific image representation for image classification using multiple scale-invariant region detectors. Pattern Anal Applic 20, 717–732 (2017). https://doi.org/10.1007/s10044-016-0529-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-016-0529-z