Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Semantics-preserving bag-of-words models and applications

Published: 01 July 2010 Publication History

Abstract

The Bag-of-Words (BoW) model is a promising image representation technique for image categorization and annotation tasks. One critical limitation of existing BoW models is that much semantic information is lost during the codebook generation process, an important step of BoW. This is because the codebook generated by BoW is often obtained via building the codebook simply by clustering visual features in Euclidian space. However, visual features related to the same semantics may not distribute in clusters in the Euclidian space, which is primarily due to the semantic gap between low-level features and high-level semantics. In this paper, we propose a novel scheme to learn optimized BoW models, which aims to map semantically related features to the same visual words. In particular, we consider the distance between semantically identical features as a measurement of the semantic gap, and attempt to learn an optimized codebook by minimizing this gap, aiming to achieve the minimal loss of the semantics. We refer to such kind of novel codebook as semantics-preserving codebook (SPC) and the corresponding model as the Semantics-Preserving Bag-of-Words (SPBoW) model. Extensive experiments on image annotation and object detection tasks with public testbeds from MIT's Labelme and PASCAL VOC challenge databases show that the proposed SPC learning scheme is effective for optimizing the codebook generation process, and the SPBoW model is able to greatly enhance the performance of the existing BoW model.

References

[1]
A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall, "Learning distance functions using equivalence relations," in Proc. 20th Int. Conf. Machine Learning, 2003, pp. 11-18.
[2]
K. Barnard, P. Duygulu, D. Forsyth, N. D. Freitas, D. M. Blei, J. K. T. Hofmann, T. Poggio, and J. Shawe-taylor, "Matching words and pictures," J. Mach. Learn. Res., vol. 3, pp. 1107-1135, 2003.
[3]
D. M. Blei, A. Y. Ng, M. I. Jordan, and J. Lafferty, "Latent dirichlet allocation," J. Mach. Learn. Res., vol. 3, pp. 993-1022, 2003.
[4]
L. Cao and L. Fei-Fei, "Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes," in Proc. IEEE Int. Conf. Computer Vision, 2007, pp. 1-8.
[5]
G. Carneiro and N. Vasconcelos, "Formulating semantic image annotation as a supervised learning problem," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005, pp. 163-168.
[6]
C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, "Visual categorization with bags of keypoints," in Proc. Eur. Conf. Computer Vision Int.Workshop on Statistical Learning in Computer Vision, 2004, pp. 1-22.
[7]
J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon, "Informationtheoretic metric learning," in Proc. Int. Conf. Machine Learning, 2007, pp. 209-216.
[8]
P. Duygulu, K. Barnard, J. de Freitas, and D. A. Forsyth, "Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary," in Proc. Eur. Conf. Computer Vision, 2002, pp. 97-112.
[9]
M. Everingham, C. W. A. Zisserman, and L. V. Gool, "The 2006 pascal visual object classes challenge," in Proc. Eur. Conf. Computer Vision, 2006, pp. 1-57.
[10]
M. Everingham, A. Zisserman, C. K. I. Williams, and L. V. Gool, The PASCAL Visual Object Classes Challenge 2006 (VOC2006) Results {Online}. Available: http://www.pascal-network.org/challenges/VOC/ voc2006/results.pdf
[11]
J. Fan, Y. Gao, and H. Luo, "Multi-level annotation of natural scenes using dominant image components and semantic concepts," in ACM Multimedia, 2004, pp. 540-547.
[12]
A. Globerson and S. Roweis, "Metric learning by collapsing classes," in Proc. NIPS, 2005, pp. 451-458.
[13]
K.-S. Goh, B. Li, and E. Chang, "Using one-class and two-class svms for multiclass image annotation," IEEE Trans. Knowl. Data Eng., vol. 17, no. 10, pp. 1333-1346, Oct. 2005.
[14]
J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov, "Neighborhood component analysis," in Proc. NIPS, 2004, pp. 513-520.
[15]
E. Hazan, A. Agarwal, and S. Kale, "Logarithmic regret algorithms for online convex optimization," Mach. Learn., vol. 69, no. 2-3, pp. 169-192, 2007.
[16]
V. Hedau, H. Arora, and N. Ahuja, "Matching images under unstable segmentations," in Proc. IEEE Conf. Computer Vision Pattern Recognition, 2008, pp. 1-8.
[17]
T. Hofmann, "Probabilistic latent semantic indexing," in Proc. SIGIR, Berkeley, CA, 1999, pp. 50-57.
[18]
S. C. H. Hoi, W. Liu, and S.-F. Chang, "Semi-supervised distance metric learning for collaborative image retrieval," presented at the Proc. IEEE Conf. Computer Vision and Pattern Recognition, Jun. 2008.
[19]
S. C. H. Hoi, W. Liu, M. R. Lyu, and W.-Y. Ma, "Learning distance metrics with contextual constraints for image retrieval," in Proc. IEEE Conf. Computer Vision Pattern Recognition, New York, Jun. 17-22, 2006, pp. 2072-2078.
[20]
J. Jeon, V. Lavrenko, and R. Manmatha, "Automatic image annotation and retrieval using cross-media relevance models," in Proc. 26th ACM SIGIR Conf., 2003, pp. 119-126.
[21]
Y.-G. Jiang, C.-W. Ngo, and J. Yang, "Towards optimal bag-of-features for object categorization and semantic video retrieval," in Proc. 6th ACM Int. Conf. on Image and Video Retrieval, Amsterdam, The Netherlands, 2007, pp. 494-501.
[22]
R. Jin, J. Y. Chai, and L. Si, "Effective automatic image annotation via a coherent language model and active learning," in Proc. 12th ACM Int. Conf. Multimedia, New York, 2004, pp. 892-899.
[23]
C. H. Lampert, M. B. Blaschko, and T. Hofmann, "Beyond sliding windows: Object localization by efficient subwindow search," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008, pp. 1-8.
[24]
S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," in Proc. CVPR, 2006, vol. 2, pp. 2169-2178.
[25]
J. Li and J. Z.Wang, "Real-time computerized annotation of pictures," in Proc. 14th Ann. ACM Int. Conf. Multimedia, Santa Barbara, CA, 2006, pp. 911-920.
[26]
J. Li, W. Wu, T. Wang, and Y. Zhang, "One step beyond histograms: Image representation using markov stationary features," in Proc. IEEE Conf. Computer Vision Pattern Recognition, 2008, pp. 1-8.
[27]
D. G. Lowe, "Distinctive image features from scale-invariant keypoints," in Int. J. Comput. Vis., 2004, vol. 60, pp. 91-110.
[28]
R. Maree, P. Geurts, J. Piater, and L.Wehenkel, "Random subwindows for robust image classification," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Washington, DC, 2005, pp. 34-40.
[29]
F. Perronnin, C. Dance, G. Csurka, and M. Bressan, "Adapted vocabularies for generic visual categorization," in Proc. Eur. Conf. Computer Vision, 2006, pp. 464-475.
[30]
B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, "Labelme: A database and web-based tool for image annotation," Int. J. Comput. Vision, vol. 77, no. 1-3, pp. 157-173, 2008.
[31]
S. Shalev-Shwartz and Y. Singer, Logarithmic Regret Algorithms for Strongly Convex Repeated Games, Hebrew Univ., 2007.
[32]
J. Shotton, M. Johnson, and R. Cipolla, "Semantic texton forests for image categorization and segmentation," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008, pp. 1-8.
[33]
L. Si, R. Jin, S. C. H. Hoi, and M. R. Lyu, "Collaborative image retrieval via regularized metric learning," ACM Multimedia Syst. J., vol. 12, no. 1, pp. 34-44, 2006.
[34]
J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman, "Discovering object categories in image collections," in Proc. IEEE Int. Conf. Computer Vision, 2005, pp. 2254-2261.
[35]
P. Tirilly, V. Claveau, and P. Gros, "Language modeling for bag-of-visual words image categorization," in Proc. ACM Int. Conf. on Content-Based Image and Video Retrieval, Niagara Falls, Canada, 2008, pp. 249-258.
[36]
E. Tola, V. Lepetit, and P. Fua, "A fast local descriptor for dense matching," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008, pp. 1-8.
[37]
J. Z. Wang, J. Li, and G. Wiederhold, "SIMPLicity: Semantics-sensitive integrated matching for picture libraries," IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 9, pp. 947-963, Sep. 2001.
[38]
X.-J. Wang, L. Zhang, F. Jing, and W.-Y. Ma, "Annosearch: Image auto-annotation by search," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006, pp. 1483-1490.
[39]
K. Weinberger, J. Blitzer, and L. Saul, "Distance metric learning for large margin nearest neighbor classification," Advances Neural Inf. Process. Syst., vol. 18, pp. 1473-1480, 2006.
[40]
L.Wu, S. C. H. Hoi, R. Jin, J. Zhu, and N.Yu, "Distance metric learning from uncertain side information with application to automated photo tagging," in Proc. 17th ACM Int. Conf. Multimedia (MM'09), Beijing, China, 2009, pp. 135-144.
[41]
L. Wu, Y. Hu, M. Li, N. Yu, and X.-S. Hua, "Scale-invariant visual language modeling for object categorization," IEEE Trans. Multimedia, vol. 11, no. 2, pp. 286-294, Feb. 2009.
[42]
L. Wu, M. Li, Z. Li, W.-Y. Ma, and N. Yu, "Visual language modeling for image classification," in Proc. Int. Workshop on Multimedia Information Retrieval, Augsburg, Bavaria, Germany, 2007, pp. 115-124.
[43]
E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell, "Distance metric learning with application to clustering with side-information," in Proc. NIPS, 2002, pp. 505-512.
[44]
L. Yang, R. Jin, R. Sukthankar, and Y. Liu, "An efficient algorithm for local distance metric learning," in Proc. AAAI, 2006, pp. 543-548.

Cited By

View all
  • (2024)Backdoor Attacks against Voice Recognition Systems: A SurveyACM Computing Surveys10.1145/370198557:3(1-35)Online publication date: 22-Nov-2024
  • (2024)Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680616(8120-8128)Online publication date: 28-Oct-2024
  • (2024)Exploring Semantic Redundancy using Backdoor Triggers: A Complementary Insight into the Challenges Facing DNN-based Software Vulnerability DetectionACM Transactions on Software Engineering and Methodology10.1145/364033333:4(1-28)Online publication date: 24-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing
IEEE Transactions on Image Processing  Volume 19, Issue 7
July 2010
275 pages

Publisher

IEEE Press

Publication History

Published: 01 July 2010
Revised: 11 January 2010
Received: 31 May 2009

Author Tags

  1. Bag-of-words models
  2. bag-of-words models
  3. distance metric learning
  4. image annotation
  5. object representation
  6. semantic gap

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Backdoor Attacks against Voice Recognition Systems: A SurveyACM Computing Surveys10.1145/370198557:3(1-35)Online publication date: 22-Nov-2024
  • (2024)Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680616(8120-8128)Online publication date: 28-Oct-2024
  • (2024)Exploring Semantic Redundancy using Backdoor Triggers: A Complementary Insight into the Challenges Facing DNN-based Software Vulnerability DetectionACM Transactions on Software Engineering and Methodology10.1145/364033333:4(1-28)Online publication date: 24-Jan-2024
  • (2024)Machine Translation Testing via Syntactic Tree PruningACM Transactions on Software Engineering and Methodology10.1145/364032933:5(1-39)Online publication date: 4-Jun-2024
  • (2024)Online Social Network User Home Location Inference Based on Heterogeneous NetworksIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.337637221:6(5509-5525)Online publication date: 1-Nov-2024
  • (2023)Uncertainty-aware image captioningProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i1.25137(614-622)Online publication date: 7-Feb-2023
  • (2023)Content-based and Knowledge-enriched Representations for Classification Across Modalities: A SurveyACM Computing Surveys10.1145/358368255:14s(1-40)Online publication date: 13-Feb-2023
  • (2023)Sentiment analysis of online reviews for electric vehicles using the SMAA-2 method and interval type-2 fuzzy setsApplied Soft Computing10.1016/j.asoc.2023.110745147:COnline publication date: 1-Nov-2023
  • (2022)Topic attention encoderJournal of Information Science10.1177/016555152097745348:5(701-717)Online publication date: 1-Oct-2022
  • (2022)Deep-Learning-Based Financial Message Sentiment Classification in Business ManagementComputational Intelligence and Neuroscience10.1155/2022/38886752022Online publication date: 1-Jan-2022
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media