research-article

Towards multi-semantic image annotation with graph regularized exclusive group lasso

Authors:

Tat-Seng ChuaAuthors Info & Claims

MM '11: Proceedings of the 19th ACM international conference on Multimedia

Pages 263 - 272

https://doi.org/10.1145/2072298.2072334

Published: 28 November 2011 Publication History

Abstract

To bridge the semantic gap between low level feature and human perception, most of the existing algorithms aim mainly at annotating images with concepts coming from only one semantic space, e.g. cognitive or affective. The naive combination of the outputs from these spaces will implicitly force the conditional independence and ignore the correlations among the spaces. In this paper, to exploit the comprehensive semantic of images, we propose a general framework for harmoniously integrating the above multiple semantics, and investigating the problem of learning to annotate images with training images labeled in two or more correlated semantic spaces, such as fascinating nighttime, or exciting cat. This kind of semantic annotation is more oriented to real world search scenario. Our proposed approach outperforms the baseline algorithms by making the following contributions. 1) Unlike previous methods that annotate images within only one semantic space, our proposed multi-semantic annotation associates each image with labels from multiple semantic spaces. 2) We develop a multi-task linear discriminative model to learn a linear mapping from features to labels. The tasks are correlated by imposing the exclusive group lasso regularization for competitive feature selection, and the graph Laplacian regularization to deal with insufficient training sample issue. 3) A Nesterov-type smoothing approximation algorithm is presented for efficient optimization of our model. Extensive experiments on NUS-WIDEEmotive dataset (56k images) with 8×81 emotive cognitive concepts and Object&Scene datasets from NUS-WIDE well validate the effectiveness of the proposed approach.

References

[1]

A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73 (3):243--272, 2008.

Digital Library

[2]

S. Becker, J. Bobin, and E. Candes. NESTA: A fast and accurate first-order method for sparse recovery. SIAM J. on Imaging Sciences, 4(1):1--39, 2011.

Digital Library

[3]

R. Caruana. Multi-task learning. Machine Learning, 28(1):41--75, 1997.

Digital Library

[4]

X. Chen, Y. Mu, S. Yan, and T.-S. Chua. Efficient large-scale image annotation by probabilistic collaborative multi-label propagation. In ACM MM, 2010.

Digital Library

[5]

T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.-T. Zheng. NUS-WIDE: A real-world web image database from national university of singapore. In ACM CIVR, 2009.

Digital Library

[6]

R. Cilibrasi and P. M. B. Vitanyi. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370--383, 2007.

Digital Library

[7]

T. Evgeniou and M. Pontil. Regularized multi--task learning. In SIGKDD, 2004.

Digital Library

[8]

M. Fornasier and H. Rauhut. Recovery algorithm for vector-valued data with joint sparsity constraints. SIAM Journal on Numerical Analysis, 46(2):577--613, 2008.

Digital Library

[9]

T. Griffiths and Z. Ghahramani. Infinite latent feature models and the indian buffet process. In NIPS, 2005.

Digital Library

[10]

A. Hanjalic. Extracting moods from pictures and sounds: Towards truly personalized TV. Signal Processing Magazine, 23(2):90--100, 2006.

[11]

J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29--36, 1982.

[12]

T. Hayashi and M. Hagiwara. Image query by impression words-the IQI system. IEEE Transactions on Consumer Electronics, 44(2):347--352, 1998.

Digital Library

[13]

L. Jacob, G. Obozinski, and J.-P. Vert. Group lasso with overlap and graph lasso. In ICML, 2009.

Digital Library

[14]

K. Kesorn. Multi-Model Multi-Semantic Image Retrieval. PhD Thesis, Queen Mary, University of London, 2010.

[15]

M. Kowalski. Sparse regression using mixed norms. Applied and Computational Harmonic Analysis, 27(3):303--324, 2009.

[16]

M. Kowalski and B. Torreesani. Sparsity and persistence: Mixed norms provide simple signals models with dependent coefficient. Signal, Image and Video Processing, 3(3):251--264, 2009.

[17]

M. Lew, N. Sebe, C. Djeraba, and R. Jain. Content-based multimedia information retrieval: State-of-the-art and challenges. ACM Trans. Multimedia Comput. Commun. Appl., 2(1):1--19, 2006.

Digital Library

[18]

D. Liu, X.-S. Hua, L. Yang, M. Wang, and H.-J. Zhang. Tag ranking. In WWW, 2009.

Digital Library

[19]

H. Liu, M. Palatucci, and J. Zhang. Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In ICML, pages 649--656, 2009.

Digital Library

[20]

Y. Liu, R. Jin, and L. Yang. Semi-supervised multi-label learning by constrained non-negative matrix factorization. In AAAI, 2006.

Digital Library

[21]

J. Machajdik and A. Hanbury. Affective image classification using features inspired by psychology and art theory. In ACM MM, 2010.

Digital Library

[22]

J. A. Mikels, B. L. Fredrickson, G. R. Larkin, C. M. Lindberg, S. J. Maglio, and P. A. Reuter-Lorenz. Emotional category data on images from the international affective picture system. Behavior Research Methods, 37(4):626--630, 2005.

[23]

Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, 2004.

Digital Library

[24]

Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127--152, 2005.

Digital Library

[25]

J. Nocedal and S. J. Wright. Numerical Optimization. Springer-Verlag, 2006.

[26]

G. Obozinski, B. Taskar, and M. Jordan. Joint covariate selection and joint subspace selection for multiple classification problems. Journal of Statistics and Computing, 20(2):231--252, 2009.

Digital Library

[27]

G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang. Correlative multi-label video annotation. In ACM MM, 2007.

Digital Library

[28]

A. Subramanya and J. Bilmes. Entropic graph regularization in non-parametric semi-supervised classification. In NIPS, 2009.

[29]

J. Tang, S. Yan, R. Hong, G.-J. Qi, and T.-S. Chua. Inferring semantic concepts from community-contributed images and noisy tags. In ACM MM, 2009.

Digital Library

[30]

P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal of Optimization, 2008.

[31]

N. Ueda and K. Saito. Parametric mixture models for multilabeled text. In NIPS, 2002.

[32]

F. Wang and C. Zhang. Label propagation through linear neighborhoods. In ICML, 2006.

Digital Library

[33]

M. Wang, X.-S. Hua, R. Hong, J. Tang, G.-J. Qi, and Y. Song. Unified video annotation via multi-graph learning. IEEE Transactions on Circuits and Systems for Video Technology, 19(5):733--746, 2009.

Digital Library

[34]

M. Wang, X.-S. Hua, J. Tang, and R. Hong. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Transactions on Multimedia, 11(3):465--476, 2009.

Digital Library

[35]

W.-N. Wang, Y.-L. Yu, and S.-M. Jiang. Image retrieval by emotional semantics: A study of emotional space and feature extraction. In IEEE SMC, 2006.

[36]

L. Wu, X.-S. Hua, N. Yu, W.-Y. Ma, and S. Li. Semi-supervised multi-label learning by constrained non-negative matrix factorization. In ACM MM, 2008.

[37]

Q. Wu, C. Zhou, and C. Wang. Content-based affective image classification and retrieval using support vector machines. Affective Computing and Intelligent Interaction, 37(84):239--247, 2005.

Digital Library

[38]

V. Yanulevskaya, J. C. van Gemert, K. Roth, A. K. Herbold, N. Sebe, and J. M. Geusebroek. Emotional valence categorization using holistic image features. In ICIP, 2008.

[39]

M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society. Series B, 68(1):49--67, 2006.

[40]

Z.-J. Zha, X.-S. Hua, T. Mei, J. Wang, G.-J. Qi, and Z. Wang. Joint multi-label multi-instance learning for image classification. In CVPR, 2008.

[41]

Z.-J. Zha, L. Yang, T. Mei, M. Wang, and Z. Wang. Visual query suggestion. In ACM MM, 2009.

Digital Library

[42]

J. Zhang. A probabilistic framework for multi-task learning. Technical report, Carnegie Mellon University-LTI-06-006, 2006.

[43]

P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. The Annals of Statistics, 37(6A):3468--3497, 2009.

[44]

Y. Zhou, R. Jin, and S. C. Hoi. Exclusive lasso for multi-task feature selection. In AISTATS, 2010.

[45]

S. Zhu, X. Ji, W. Xu, and Y. Gong. Multi-labelled classification using maximum entropy method. In ACM SIGIR, 2005.

Digital Library

Cited By

Yan CLi XTao RZhang ZWan Y(2024)MeFiNet: Modeling multi-semantic convolution-based feature interactions for CTR predictionIntelligent Data Analysis10.3233/IDA-22711328:1(261-278)Online publication date: 3-Feb-2024
https://doi.org/10.3233/IDA-227113
Bouchakwa MAyadi YAmous I(2020)A review on visual content-based and users’ tags-based image annotation: methods and techniquesMultimedia Tools and Applications10.1007/s11042-020-08862-1Online publication date: 9-May-2020
https://doi.org/10.1007/s11042-020-08862-1
Ma XLiu QOu WZhou Q(2018)Visual object tracking via coefficients constrained exclusive group LASSOMachine Vision and Applications10.1007/s00138-018-0930-229:5(749-763)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.1007/s00138-018-0930-2
Show More Cited By

Index Terms

Towards multi-semantic image annotation with graph regularized exclusive group lasso
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Graph regularized low-rank feature mapping for multi-label learning with application to image annotation

Automatic image annotation has emerged as a hot research topic in the last two decades due to its application in social images organization. Most studies treat image annotation as a typical multi-label classification problem, where the shortcoming of ...
Image annotation using bi-relational graph of images and semantic labels
CVPR '11: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

Image annotation is usually formulated as a multi-label semi-supervised learning problem. Traditional graph-based methods only utilize the data (images) graph induced from image similarities, while ignore the label (semantic terms) graph induced from ...
Hidden-concept driven image decomposition towards semi-supervised multi-label image annotation
ICIMCS '09: Proceedings of the First International Conference on Internet Multimedia Computing and Service

Conventional semi-supervised learning algorithms over multi-label image data propagate labels predominantly via the holistic image similarities, ignoring that each label essentially only characterizes a local region within an image. In this paper, we ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '11: Proceedings of the 19th ACM international conference on Multimedia

November 2011

944 pages

ISBN:9781450306164

DOI:10.1145/2072298

General Chairs:
K. Selçuk Candan
Arizona State University, USA
,
Sethuraman Panchanathan
Arizona State University, USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
,
Program Chairs:
Hari Sundaram
Arizona State University, USA
,
Wu-Chi Feng
Portland State University, USA
,
Nicu Sebe
University of Trento, Italy

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 November 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '11

Sponsor:

SIGMM

MM '11: ACM Multimedia Conference

November 28 - December 1, 2011

Arizona, Scottsdale, USA

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
479
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)2

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yan CLi XTao RZhang ZWan Y(2024)MeFiNet: Modeling multi-semantic convolution-based feature interactions for CTR predictionIntelligent Data Analysis10.3233/IDA-22711328:1(261-278)Online publication date: 3-Feb-2024
https://doi.org/10.3233/IDA-227113
Bouchakwa MAyadi YAmous I(2020)A review on visual content-based and users’ tags-based image annotation: methods and techniquesMultimedia Tools and Applications10.1007/s11042-020-08862-1Online publication date: 9-May-2020
https://doi.org/10.1007/s11042-020-08862-1
Ma XLiu QOu WZhou Q(2018)Visual object tracking via coefficients constrained exclusive group LASSOMachine Vision and Applications10.1007/s00138-018-0930-229:5(749-763)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.1007/s00138-018-0930-2
Barnard K(2016)Computational Methods for Integrating Vision and LanguageSynthesis Lectures on Computer Vision10.2200/S00705ED1V01Y201602COV0076:1(1-227)Online publication date: 20-Apr-2016
https://doi.org/10.2200/S00705ED1V01Y201602COV007
Zhu YZhong ZCao WCheng D(2016)Graph feature selection for dementia diagnosisNeurocomputing10.1016/j.neucom.2015.09.126195:C(19-22)Online publication date: 26-Jun-2016
https://dl.acm.org/doi/10.1016/j.neucom.2015.09.126
Xu XShimada ANagahara HTaniguchi R(2016)Learning multi-task local metrics for image annotationMultimedia Tools and Applications10.1007/s11042-014-2402-775:4(2203-2231)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1007/s11042-014-2402-7
Xiangyu Chen Yanwu Xu Fengshou Yin Zhuo Zhang Wong DTien Yin Wong Jiang Liu (2015)Multiple ocular diseases detection based on joint sparse multi-task learning2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)10.1109/EMBC.2015.7319578(5260-5263)Online publication date: Aug-2015
https://doi.org/10.1109/EMBC.2015.7319578
Ringsquandl MLamparter SBrandt SHubauer TLepratti R(2015)Semantic-Guided Feature Selection for Industrial Automation SystemsThe Semantic Web - ISWC 201510.1007/978-3-319-25010-6_13(225-240)Online publication date: 24-Oct-2015
https://doi.org/10.1007/978-3-319-25010-6_13
Chen XXu YYan SChua TWong DWong TLiu J(2015)Discriminative Feature Selection for Multiple Ocular Diseases Classification by Sparse Induced Graph Regularized Group LassoMedical Image Computing and Computer-Assisted Intervention -- MICCAI 201510.1007/978-3-319-24571-3_2(11-19)Online publication date: 20-Nov-2015
https://doi.org/10.1007/978-3-319-24571-3_2
Zomahoun DYetongnon K(2014)EMERGSEMProceedings of the 2014 Tenth International Conference on Signal-Image Technology and Internet-Based Systems10.1109/SITIS.2014.117(256-263)Online publication date: 23-Nov-2014
https://dl.acm.org/doi/10.1109/SITIS.2014.117
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents