Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Multilevel Image Coding with Hyperfeatures

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Histograms of local appearance descriptors are a popular representation for visual recognition. They are highly discriminant with good resistance to local occlusions and to geometric and photometric variations, but they are not able to exploit spatial co-occurrence statistics over scales larger than the local input patches. We present a multilevel visual representation that remedies this. The starting point is the notion that to detect object parts in images, in practice it often suffices to detect co-occurrences of more local object fragments. This can be formalized by coding image patches against a codebook of known fragments or a more general statistical model and locally histogramming the resulting labels to capture their co-occurrence statistics. Local patch descriptors are converted into somewhat less local histograms over label occurrences. The histograms are themselves local descriptor vectors so the process can be iterated to code ever larger assemblies of object parts and increasingly abstract or ‘semantic’ image properties. We call these higher-level descriptors “hyperfeatures”. We formulate the hyperfeature model and study its performance under several different image coding methods including k-means based Vector Quantization, Gaussian Mixtures, and combinations of these with Latent Dirichlet Allocation. We find that the resulting high-level features provide improved performance in several object image and texture image classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Find the latest articles, discoveries, and news in related topics.

References

  • Agarwal, A., & Triggs, B. (2006). Hyperfeatures—multilevel local coding for visual recognition. In European conference on computer vision (pp. 30–43).

  • Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.

    Article  Google Scholar 

  • Berg, A., & Malik, J. (2001). Geometric blur for template matching. In International conference on computer vision & pattern recognition.

  • Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    Article  MATH  Google Scholar 

  • Bouman, C. A. (1997). Cluster: an unsupervised algorithm for modeling Gaussian mixtures. Available from http://www.ece.purdue.edu/~bouman, April 1997.

  • Buntine, W., & Jakaulin, A. (2005). Discrete principal component analysis. Technical report, HIIT.

  • Buntine, W., & Perttu, S. (2003). Is multinomial PCA multi-faceted clustering or dimensionality reduction? In AI and statistics.

  • Canny, J. (2004). GaP: A factor model for discrete data. In ACM conference on information retrieval (SIGIR), Sheffield, UK.

  • Csurka, G., Bray, C., Dance, C., & Fan, L. (2004). Visual categorization with bags of keypoints. In European conference on computer vision.

  • Dorko, G., & Schmid, C. (2005). Object class recognition using discriminative local features. Technical report, INRIA Rhône Alpes.

  • Everingham, M., et al. (2006). The 2005 PASCAL visual object classes challenge. In F. d’Alche Buc, I. Dagan, & J. Quinonero (Eds.), Springer Lecture notes in artificial intelligence. Proceedings of the first PASCAL challenges workshop. Berlin: Springer.

    Google Scholar 

  • Epshtein, B., & Ullman, S. (2005). Feature hierarchies for object classification. In International conference on computer vision.

  • Fei-Fei, L., & Perona, P. (2005) A Bayesian hierarchical model for learning natural scene categories. In International conference on computer vision & pattern recognition.

  • Ferencz, A., Learned-Miller, E., & Malik, J. (2004). Learning hyper-features for visual identification. In Neural information processing systems.

  • Fritz, M., Hayman, E., Caputo, B., & Eklundh, J.-O. (2004). On the significance of real-world conditions for material classification. In European conference on computer vision.

  • Fukushima, K. (1980). Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.

    Article  MATH  Google Scholar 

  • Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In Alvey vision conference (pp. 147–151).

  • Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of uncertainty in artificial intelligence, Stockholm.

  • Joachims, T. (1999). Making large-scale SVM learning practical. In Advances in kernel methods—support vector learning. London: MIT Press.

    Google Scholar 

  • Jurie, F., & Triggs, B. (2005). Creating efficient codebooks for visual recognition. In International conference on computer vision.

  • Kadir, T., & Brady, M. (2001). Saliency, scale and image description. International Journal of Computer Vision, 45(2), 83–105.

    Article  MATH  Google Scholar 

  • Keller, M., & Bengio, S. (2004). Theme-topic mixture model for document representation. In PASCAL workshop on learning methods for text understanding and mining.

  • Lang, G., & Seitz, P. (1997). Robust classification of arbitrary object classes based on hierarchical spatial feature-matching. Machine Vision and Applications, 10(3), 123–135.

    Article  Google Scholar 

  • Lazebnik, S., Schmid, C., & Ponce, J. (2003). Affine-invariant local descriptors and neighborhood statistics for texture recognition. In International conference on computer vision.

  • Lazebnik, S., Schmid, C., & Ponce, J. (2004). Semi-local affine parts for object recognition. In British machine vision conference (Vol. 2, pp. 779–788).

  • Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In International conference on computer vision & pattern recognition.

  • LeCun, Y., Huang, F.-J., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In IEEE conference on computer vision and pattern recognition.

  • Leung, T., & Malik, J. (1999). Recognizing surfaces using three-dimensional textons. In International conference on computer vision.

  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Malik, J., & Perona, P. (1990). Preattentive texture discrimination with early vision mechanisms. Journal of the Optical Society of America A, 7(5), 923–932.

    Google Scholar 

  • Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.

    Article  Google Scholar 

  • Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Van Gool, L. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1–2), 43–72.

    Article  Google Scholar 

  • Mori, G., & Malik, J. (2003). Recognizing objects in adversarial clutter: breaking a visual CAPTCHA. In International conference on computer vision & pattern recognition.

  • Mutch, J., & Lowe, D. (2006). Multiclass object recognition with sparse, localized features. In International conference on computer vision & pattern recognition (Vol. I, pp. 11–18).

  • Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In European conference on computer vision.

  • Puzicha, J., Hofmann, T., & Buhmann, J. (1999). Histogram clustering for unsupervised segmentation and image retrieval. Pattern Recognition Letters, 20, 899–909.

    Article  Google Scholar 

  • Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025.

    Article  Google Scholar 

  • Schaffalitzky, F., & Zisserman, A. (2001). Viewpoint invariant texture matching and wide baseline stereo. In International conference on computer vision (pp. 636–643), Vancouver.

  • Schiele, B., & Crowley, J. (2000). Recognition without correspondence using multidimensional receptive field histograms. International Journal of Computer Vision, 36(1), 31–50.

    Article  Google Scholar 

  • Schiele, B., & Pentland, A. (1999). Probabilistic object recognition and localization. In International conference on computer vision.

  • Schmid, C. (2004). Weakly supervised learning of visual models and its application to content-based retrieval. International Journal of Computer Vision, 56(1), 7–16.

    Article  Google Scholar 

  • Schmid, C., & Mohr, R. (1997). Local gray value invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 530–534.

    Article  Google Scholar 

  • Serre, T., Wolf, L., & Poggio, T. (2005). Object recognition with features inspired by visual cortex. In International conference on computer vision & pattern recognition.

  • Vapnik, V. (1995). The nature of statistical learning theory. Berlin: Springer.

    MATH  Google Scholar 

  • Varma, M., & Zisserman, A. (2003). Texture classification: are filter banks necessary? In International conference on computer vision & pattern recognition.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ankur Agarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Agarwal, A., Triggs, B. Multilevel Image Coding with Hyperfeatures. Int J Comput Vis 78, 15–27 (2008). https://doi.org/10.1007/s11263-007-0072-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-007-0072-x

Keywords