research-article

Semantics-preserving bag-of-words models and applications

Authors:

Steven C. H. Hoi,

Nenghai YuAuthors Info & Claims

IEEE Transactions on Image Processing, Volume 19, Issue 7

Pages 1908 - 1920

https://doi.org/10.1109/TIP.2010.2045169

Published: 01 July 2010 Publication History

Abstract

The Bag-of-Words (BoW) model is a promising image representation technique for image categorization and annotation tasks. One critical limitation of existing BoW models is that much semantic information is lost during the codebook generation process, an important step of BoW. This is because the codebook generated by BoW is often obtained via building the codebook simply by clustering visual features in Euclidian space. However, visual features related to the same semantics may not distribute in clusters in the Euclidian space, which is primarily due to the semantic gap between low-level features and high-level semantics. In this paper, we propose a novel scheme to learn optimized BoW models, which aims to map semantically related features to the same visual words. In particular, we consider the distance between semantically identical features as a measurement of the semantic gap, and attempt to learn an optimized codebook by minimizing this gap, aiming to achieve the minimal loss of the semantics. We refer to such kind of novel codebook as semantics-preserving codebook (SPC) and the corresponding model as the Semantics-Preserving Bag-of-Words (SPBoW) model. Extensive experiments on image annotation and object detection tasks with public testbeds from MIT's Labelme and PASCAL VOC challenge databases show that the proposed SPC learning scheme is effective for optimizing the codebook generation process, and the SPBoW model is able to greatly enhance the performance of the existing BoW model.

References

[1]

A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall, "Learning distance functions using equivalence relations," in Proc. 20th Int. Conf. Machine Learning, 2003, pp. 11-18.

[2]

K. Barnard, P. Duygulu, D. Forsyth, N. D. Freitas, D. M. Blei, J. K. T. Hofmann, T. Poggio, and J. Shawe-taylor, "Matching words and pictures," J. Mach. Learn. Res., vol. 3, pp. 1107-1135, 2003.

Digital Library

[3]

D. M. Blei, A. Y. Ng, M. I. Jordan, and J. Lafferty, "Latent dirichlet allocation," J. Mach. Learn. Res., vol. 3, pp. 993-1022, 2003.

[4]

L. Cao and L. Fei-Fei, "Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes," in Proc. IEEE Int. Conf. Computer Vision, 2007, pp. 1-8.

[5]

G. Carneiro and N. Vasconcelos, "Formulating semantic image annotation as a supervised learning problem," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005, pp. 163-168.

Digital Library

[6]

C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, "Visual categorization with bags of keypoints," in Proc. Eur. Conf. Computer Vision Int.Workshop on Statistical Learning in Computer Vision, 2004, pp. 1-22.

[7]

J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon, "Informationtheoretic metric learning," in Proc. Int. Conf. Machine Learning, 2007, pp. 209-216.

Digital Library

[8]

P. Duygulu, K. Barnard, J. de Freitas, and D. A. Forsyth, "Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary," in Proc. Eur. Conf. Computer Vision, 2002, pp. 97-112.

Digital Library

[9]

M. Everingham, C. W. A. Zisserman, and L. V. Gool, "The 2006 pascal visual object classes challenge," in Proc. Eur. Conf. Computer Vision, 2006, pp. 1-57.

[10]

M. Everingham, A. Zisserman, C. K. I. Williams, and L. V. Gool, The PASCAL Visual Object Classes Challenge 2006 (VOC2006) Results {Online}. Available: http://www.pascal-network.org/challenges/VOC/ voc2006/results.pdf

[11]

J. Fan, Y. Gao, and H. Luo, "Multi-level annotation of natural scenes using dominant image components and semantic concepts," in ACM Multimedia, 2004, pp. 540-547.

Digital Library

[12]

A. Globerson and S. Roweis, "Metric learning by collapsing classes," in Proc. NIPS, 2005, pp. 451-458.

[13]

K.-S. Goh, B. Li, and E. Chang, "Using one-class and two-class svms for multiclass image annotation," IEEE Trans. Knowl. Data Eng., vol. 17, no. 10, pp. 1333-1346, Oct. 2005.

Digital Library

[14]

J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov, "Neighborhood component analysis," in Proc. NIPS, 2004, pp. 513-520.

[15]

E. Hazan, A. Agarwal, and S. Kale, "Logarithmic regret algorithms for online convex optimization," Mach. Learn., vol. 69, no. 2-3, pp. 169-192, 2007.

Digital Library

[16]

V. Hedau, H. Arora, and N. Ahuja, "Matching images under unstable segmentations," in Proc. IEEE Conf. Computer Vision Pattern Recognition, 2008, pp. 1-8.

[17]

T. Hofmann, "Probabilistic latent semantic indexing," in Proc. SIGIR, Berkeley, CA, 1999, pp. 50-57.

Digital Library

[18]

S. C. H. Hoi, W. Liu, and S.-F. Chang, "Semi-supervised distance metric learning for collaborative image retrieval," presented at the Proc. IEEE Conf. Computer Vision and Pattern Recognition, Jun. 2008.

[19]

S. C. H. Hoi, W. Liu, M. R. Lyu, and W.-Y. Ma, "Learning distance metrics with contextual constraints for image retrieval," in Proc. IEEE Conf. Computer Vision Pattern Recognition, New York, Jun. 17-22, 2006, pp. 2072-2078.

Digital Library

[20]

J. Jeon, V. Lavrenko, and R. Manmatha, "Automatic image annotation and retrieval using cross-media relevance models," in Proc. 26th ACM SIGIR Conf., 2003, pp. 119-126.

Digital Library

[21]

Y.-G. Jiang, C.-W. Ngo, and J. Yang, "Towards optimal bag-of-features for object categorization and semantic video retrieval," in Proc. 6th ACM Int. Conf. on Image and Video Retrieval, Amsterdam, The Netherlands, 2007, pp. 494-501.

Digital Library

[22]

R. Jin, J. Y. Chai, and L. Si, "Effective automatic image annotation via a coherent language model and active learning," in Proc. 12th ACM Int. Conf. Multimedia, New York, 2004, pp. 892-899.

Digital Library

[23]

C. H. Lampert, M. B. Blaschko, and T. Hofmann, "Beyond sliding windows: Object localization by efficient subwindow search," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008, pp. 1-8.

[24]

S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," in Proc. CVPR, 2006, vol. 2, pp. 2169-2178.

Digital Library

[25]

J. Li and J. Z.Wang, "Real-time computerized annotation of pictures," in Proc. 14th Ann. ACM Int. Conf. Multimedia, Santa Barbara, CA, 2006, pp. 911-920.

Digital Library

[26]

J. Li, W. Wu, T. Wang, and Y. Zhang, "One step beyond histograms: Image representation using markov stationary features," in Proc. IEEE Conf. Computer Vision Pattern Recognition, 2008, pp. 1-8.

[27]

D. G. Lowe, "Distinctive image features from scale-invariant keypoints," in Int. J. Comput. Vis., 2004, vol. 60, pp. 91-110.

Digital Library

[28]

R. Maree, P. Geurts, J. Piater, and L.Wehenkel, "Random subwindows for robust image classification," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Washington, DC, 2005, pp. 34-40.

Digital Library

[29]

F. Perronnin, C. Dance, G. Csurka, and M. Bressan, "Adapted vocabularies for generic visual categorization," in Proc. Eur. Conf. Computer Vision, 2006, pp. 464-475.

Digital Library

[30]

B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, "Labelme: A database and web-based tool for image annotation," Int. J. Comput. Vision, vol. 77, no. 1-3, pp. 157-173, 2008.

Digital Library

[31]

S. Shalev-Shwartz and Y. Singer, Logarithmic Regret Algorithms for Strongly Convex Repeated Games, Hebrew Univ., 2007.

[32]

J. Shotton, M. Johnson, and R. Cipolla, "Semantic texton forests for image categorization and segmentation," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008, pp. 1-8.

[33]

L. Si, R. Jin, S. C. H. Hoi, and M. R. Lyu, "Collaborative image retrieval via regularized metric learning," ACM Multimedia Syst. J., vol. 12, no. 1, pp. 34-44, 2006.

Digital Library

[34]

J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman, "Discovering object categories in image collections," in Proc. IEEE Int. Conf. Computer Vision, 2005, pp. 2254-2261.

Digital Library

[35]

P. Tirilly, V. Claveau, and P. Gros, "Language modeling for bag-of-visual words image categorization," in Proc. ACM Int. Conf. on Content-Based Image and Video Retrieval, Niagara Falls, Canada, 2008, pp. 249-258.

Digital Library

[36]

E. Tola, V. Lepetit, and P. Fua, "A fast local descriptor for dense matching," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008, pp. 1-8.

[37]

J. Z. Wang, J. Li, and G. Wiederhold, "SIMPLicity: Semantics-sensitive integrated matching for picture libraries," IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 9, pp. 947-963, Sep. 2001.

Digital Library

[38]

X.-J. Wang, L. Zhang, F. Jing, and W.-Y. Ma, "Annosearch: Image auto-annotation by search," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006, pp. 1483-1490.

Digital Library

[39]

K. Weinberger, J. Blitzer, and L. Saul, "Distance metric learning for large margin nearest neighbor classification," Advances Neural Inf. Process. Syst., vol. 18, pp. 1473-1480, 2006.

[40]

L.Wu, S. C. H. Hoi, R. Jin, J. Zhu, and N.Yu, "Distance metric learning from uncertain side information with application to automated photo tagging," in Proc. 17th ACM Int. Conf. Multimedia (MM'09), Beijing, China, 2009, pp. 135-144.

Digital Library

[41]

L. Wu, Y. Hu, M. Li, N. Yu, and X.-S. Hua, "Scale-invariant visual language modeling for object categorization," IEEE Trans. Multimedia, vol. 11, no. 2, pp. 286-294, Feb. 2009.

Digital Library

[42]

L. Wu, M. Li, Z. Li, W.-Y. Ma, and N. Yu, "Visual language modeling for image classification," in Proc. Int. Workshop on Multimedia Information Retrieval, Augsburg, Bavaria, Germany, 2007, pp. 115-124.

Digital Library

[43]

E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell, "Distance metric learning with application to clustering with side-information," in Proc. NIPS, 2002, pp. 505-512.

[44]

L. Yang, R. Jin, R. Sukthankar, and Y. Liu, "An efficient algorithm for local distance metric learning," in Proc. AAAI, 2006, pp. 543-548.

Digital Library

Cited By

Yan BLan JYan Z(2024)Backdoor Attacks against Voice Recognition Systems: A SurveyACM Computing Surveys10.1145/370198557:3(1-35)Online publication date: 22-Nov-2024
https://dl.acm.org/doi/10.1145/3701985
Liu SChen JRuan SSu HYin ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680616(8120-8128)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680616
Shao CLi GWu JZheng X(2024)Exploring Semantic Redundancy using Backdoor Triggers: A Complementary Insight into the Challenges Facing DNN-based Software Vulnerability DetectionACM Transactions on Software Engineering and Methodology10.1145/364033333:4(1-28)Online publication date: 24-Jan-2024
https://dl.acm.org/doi/10.1145/3640333
Show More Cited By

Recommendations

Semantics-preserving bag-of-words models for efficient image annotation
LS-MMRM '09: Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining

The Bag-of-Words (BoW) model is a promising image representation for annotation. One critical limitation of existing BoW models is the semantic loss during the codebook generation process, in which BoW simply clusters visual words in Euclidian space. ...
Image annotation by semantic sparse recoding of visual content
MM '12: Proceedings of the 20th ACM international conference on Multimedia

This paper presents a new semantic sparse recoding method to generate more descriptive and robust representation of visual content for image annotation. Although the visual bag-of-words (BOW) representation has been reported to achieve promising results ...
Image classification by visual bag-of-words refinement and reduction

This paper presents a new framework for visual bag-of-words (BOW) refinement and reduction to overcome the drawbacks associated with the visual BOW model which has been widely used for image classification. Although very influential in the literature, ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing

IEEE Transactions on Image Processing Volume 19, Issue 7

July 2010

275 pages

ISSN:1057-7149

Issue’s Table of Contents

Copyright © 2010.

Publisher

IEEE Press

Publication History

Published: 01 July 2010

Revised: 11 January 2010

Received: 31 May 2009

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yan BLan JYan Z(2024)Backdoor Attacks against Voice Recognition Systems: A SurveyACM Computing Surveys10.1145/370198557:3(1-35)Online publication date: 22-Nov-2024
https://dl.acm.org/doi/10.1145/3701985
Liu SChen JRuan SSu HYin ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680616(8120-8128)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680616
Shao CLi GWu JZheng X(2024)Exploring Semantic Redundancy using Backdoor Triggers: A Complementary Insight into the Challenges Facing DNN-based Software Vulnerability DetectionACM Transactions on Software Engineering and Methodology10.1145/364033333:4(1-28)Online publication date: 24-Jan-2024
https://dl.acm.org/doi/10.1145/3640333
Zhang QZhai JFang CLiu JSun WHu HWang Q(2024)Machine Translation Testing via Syntactic Tree PruningACM Transactions on Software Engineering and Methodology10.1145/364032933:5(1-39)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3640329
Fei GLiu YHu GWen SXiang Y(2024)Online Social Network User Home Location Inference Based on Heterogeneous NetworksIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.337637221:6(5509-5525)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TDSC.2024.3376372
Fei ZFan MZhu LHuang JWei XWei XWilliams BChen YNeville J(2023)Uncertainty-aware image captioningProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i1.25137(614-622)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i1.25137
Pittaras NGiannakopoulos GStamatopoulos PKarkaletsis V(2023)Content-based and Knowledge-enriched Representations for Classification Across Modalities: A SurveyACM Computing Surveys10.1145/358368255:14s(1-40)Online publication date: 13-Feb-2023
https://dl.acm.org/doi/10.1145/3583682
Gong BLiu RZhang XChang CLiu Z(2023)Sentiment analysis of online reviews for electric vehicles using the SMAA-2 method and interval type-2 fuzzy setsApplied Soft Computing10.1016/j.asoc.2023.110745147:COnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.asoc.2023.110745
Jin JZhao HJi P(2022)Topic attention encoderJournal of Information Science10.1177/016555152097745348:5(701-717)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1177/0165551520977453
Shao CChen X(2022)Deep-Learning-Based Financial Message Sentiment Classification in Business ManagementComputational Intelligence and Neuroscience10.1155/2022/38886752022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/3888675
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents