Abstract
Bag-of-words model (BOW) is inspired by the text classification problem, where a document is represented by an unsorted set of contained words. Analogously, in the object categorization problem, an image is represented by an unsorted set of discrete visual words (BOVW). In these models, relations among visual words are performed after dictionary construction. However, close object regions can have far descriptions in the feature space, being grouped as different visual words. In this paper, we present a method for considering geometrical information of visual words in the dictionary construction step. Object interest regions are obtained by means of the Harris-Affine detector and then described using the SIFT descriptor. Afterward, a contextual-space and a feature-space are defined, and a merging process is used to fuse feature words based on their proximity in the contextual-space. Moreover, we use the Error Correcting Output Codes framework to learn the new dictionary in order to perform multi-class classification. Results show significant classification improvements when spatial information is taken into account in the dictionary construction step.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Kadir, T., Van Gool, L.: A comparison of affine region detectors. IJCV 65(1-2), 43–72 (2005)
Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2005)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV, pp. 1–22 (2004)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR, pp. 1–8 (2007)
Chum, O., Philbin, J., Sivic, J., Zisserman, A.: Automatic query expansion with a generative feature model for object retrieval. In: ICCV, pp. 1–8 (2007)
Carneiro, G., Jepson, A.: Flexible spatial models for grouping local image features. In: CVPR, vol. 2, pp. 747–754 (2004)
Dietterich, T., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes 2, 263–282 (1995)
Escalera, S., Pujol, O., Radeva, P.: On the decoding process in ternary error-correcting output codes. Transactions in PAMI 99 (2009)
Caltech 101, http://www.vision.caltech.edu/image_datasets/caltech101/
Caltech 256, http://www.vision.caltech.edu/image_datasets/caltech256/
Clustering package, http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mirza-Mohammadi, M., Escalera, S., Radeva, P. (2009). Contextual-Guided Bag-of-Visual-Words Model for Multi-class Object Categorization. In: Jiang, X., Petkov, N. (eds) Computer Analysis of Images and Patterns. CAIP 2009. Lecture Notes in Computer Science, vol 5702. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03767-2_91
Download citation
DOI: https://doi.org/10.1007/978-3-642-03767-2_91
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03766-5
Online ISBN: 978-3-642-03767-2
eBook Packages: Computer ScienceComputer Science (R0)