research-article

ConTagNet: Exploiting User Context for Image Tag Recommendation

Authors:

Yogesh Singh Rawat,

Mohan S. KankanhalliAuthors Info & Claims

MM '16: Proceedings of the 24th ACM international conference on Multimedia

Pages 1102 - 1106

https://doi.org/10.1145/2964284.2984068

Published: 01 October 2016 Publication History

Abstract

In recent years, deep convolutional neural networks have shown great success in single-label image classification. However, images usually have multiple labels associated with them which may correspond to different objects or actions present in the image. In addition, a user assigns tags to a photo not merely based on the visual content but also the context in which the photo has been captured. Inspired by this, we propose a deep neural network which can predict multiple tags for an image based on the content as well as the context in which the image is captured. The proposed model can be trained end-to-end and solves a multi-label classification problem. We evaluate the model on a dataset of 1,965,232 images which is drawn from the YFCC100M dataset provided by the organizers of Yahoo-Flickr Grand Challenge. We observe a significant improvement in the prediction accuracy after integrating user-context and the proposed model performs very well in the Grand Challenge.

References

[1]

X. Chen, H. Fang, T. Lin, R. Vedantam, S. Gupta, P. Dollár, and C. L. Zitnick. Microsoft COCO captions: Data collection and evaluation server. CoRR, abs/1504.00325, 2015.

[2]

X. Chen and C. L. Zitnick. Learning a recurrent visual representation for image caption generation. arXiv preprint arXiv:1411.5654, 2014.

[3]

T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM international conference on image and video retrieval, page 48. ACM, 2009.

Digital Library

[4]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248--255. IEEE, 2009.

[5]

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303--338, 2010.

Digital Library

[6]

A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. Devise: A deep visual-semantic embedding model. In Advances in neural information processing systems, pages 2121--2129, 2013.

Digital Library

[7]

N. Ghamrawi and A. McCallum. Collective multi-label classification. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 195--200. ACM, 2005.

Digital Library

[8]

Y. Gong, Y. Jia, T. Leung, A. Toshev, and S. Ioffe. Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894, 2013.

[9]

Y. Gong, Q. Ke, M. Isard, and S. Lazebnik. A multi-view embedding space for modeling internet images, tags, and their semantics. International journal of computer vision, 106(2):210--233, 2014.

Digital Library

[10]

M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In 2009 IEEE 12th International Conference on Computer Vision, pages 309--316, Sept 2009.

[11]

Y. Guo and S. Gu. Multi-label classification using conditional dependency networks. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, volume 22, page 1300, 2011.

Digital Library

[12]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.

[13]

J. Johnson, L. Ballan, and L. Fei-Fei. Love thy neighbors: Image annotation by exploiting image metadata. In Proceedings of the IEEE International Conference on Computer Vision, pages 4624--4632, 2015.

Digital Library

[14]

J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.

[15]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.

Digital Library

[16]

X. Li, F. Zhao, and Y. Guo. Multi-label image classification with a probabilistic label enhancement model. Proc. Uncertainty in Artificial Intell, 2014.

[17]

T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer Vision, pages 740--755. Springer, 2014.

[18]

A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In Proceedings of the 10th European Conference on Computer Vision: Part III, ECCV '08, pages 316--329, Berlin, Heidelberg, 2008. Springer-Verlag.

Digital Library

[19]

J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410.1090, 2014.

[20]

J. Read, B. Pfahringer, G. Holmes, and E. Frank. Classifier chains for multi-label classification. Machine learning, 85(3):333--359, 2011.

Digital Library

[21]

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[22]

N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929--1958, 2014.

Digital Library

[23]

Y.-C. Su, T.-H. Chiu, G.-L. Wu, C.-Y. Yeh, F. Wu, and W. Hsu. Flickr-tag prediction using multi-modal fusion and meta information. In Proceedings of the 21st ACM international conference on Multimedia, pages 353--356. ACM, 2013.

Digital Library

[24]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1--9, 2015.

[25]

B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.-J. Li. Yfcc100m: The new data in multimedia research. Commun. ACM, 59(2):64--73, Jan. 2016.

Digital Library

[26]

O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3156--3164, 2015.

[27]

J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, and W. Xu. Cnn-rnn: A unified framework for multi-label image classification. arXiv preprint arXiv:1604.04573, 2016.

[28]

Y. Wei, W. Xia, J. Huang, B. Ni, J. Dong, Y. Zhao, and S. Yan. Cnn: Single-label to multi-label. arXiv preprint arXiv:1406.5726, 2014.

[29]

J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: learning to rank with joint word-image embeddings. Machine Learning, 81(1):21--35, 2010.

Digital Library

[30]

J. Weston, S. Bengio, and N. Usunier. Wsabie: scaling up to large vocabulary image annotation. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, pages 2764--2770. AAAI Press, 2011.

Digital Library

[31]

X. Xue, W. Zhang, J. Zhang, B. Wu, J. Fan, and Y. Lu. Correlative multi-label multi-instance image annotation. In 2011 International Conference on Computer Vision, pages 651--658. IEEE, 2011.

Digital Library

Cited By

Patel RThakkar PUkani V(2024)CNNRecEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108062133:PAOnline publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.engappai.2024.108062
Kumar CChowdary CMeena A(2024)Recent trends in recommender systems: a surveyInternational Journal of Multimedia Information Retrieval10.1007/s13735-024-00349-113:4Online publication date: 10-Oct-2024
https://doi.org/10.1007/s13735-024-00349-1
Jangid MKumar R(2024)Deep learning approaches to address cold start and long tail challenges in recommendation systems: a systematic reviewMultimedia Tools and Applications10.1007/s11042-024-20262-3Online publication date: 16-Oct-2024
https://doi.org/10.1007/s11042-024-20262-3
Show More Cited By

Index Terms

ConTagNet: Exploiting User Context for Image Tag Recommendation
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
    2. Spatial-temporal systems
      1. Location based services

Recommendations

Identifying Tags Describing Image Contents
HT '19: Proceedings of the 30th ACM Conference on Hypertext and Social Media

On many photo-sharing social media, e.g., Instagram, a user posting a photo can add tags, i.e., words describing it. Tags are used for keyword-based image search. Some tags, however, describe not image contents but some metadata, e.g., camera names. We ...
A Ubiquitous Image Tagging System Using User Context
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Tagging is nowadays the most predominant technique to make resources searchable. These allow users to create and manage tags to annotate and categorize content. In this paper, we propose an approach to tag images in a user's collection based upon user's ...
Context-aware MIML instance annotation: exploiting label correlations with classifier chains

In multi-instance multi-label (MIML) instance annotation, the goal is to learn an instance classifier while training on a MIML dataset, which consists of bags of instances paired with label sets; instance labels are not provided in the training data. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '16: Proceedings of the 24th ACM international conference on Multimedia

October 2016

1542 pages

ISBN:9781450336031

DOI:10.1145/2964284

General Chairs:
Alan Hanjalic
Delft University of Technology
,
Cees Snoek
Qualcomm Research Netherlands / University of Amsterdam
,
Marcel Worring
University of Amsterdam
,
Moderator:
Dick Bulterman
CWI / VU University Amsterdam
,
Program Chairs:
Benoit Huet
EURECOM
,
Aisling Kelliher
Virginia Tech
,
Yiannis Kompatsiaris
CERTH-ITI
,
Jin Li
Microsoft

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation Prime Minister's Office Singapore

Conference

MM '16

Sponsor:

SIGMM

MM '16: ACM Multimedia Conference

October 15 - 19, 2016

Amsterdam, The Netherlands

Acceptance Rates

MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

49
Total Citations
View Citations
749
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)5

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Patel RThakkar PUkani V(2024)CNNRecEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108062133:PAOnline publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.engappai.2024.108062
Kumar CChowdary CMeena A(2024)Recent trends in recommender systems: a surveyInternational Journal of Multimedia Information Retrieval10.1007/s13735-024-00349-113:4Online publication date: 10-Oct-2024
https://doi.org/10.1007/s13735-024-00349-1
Jangid MKumar R(2024)Deep learning approaches to address cold start and long tail challenges in recommendation systems: a systematic reviewMultimedia Tools and Applications10.1007/s11042-024-20262-3Online publication date: 16-Oct-2024
https://doi.org/10.1007/s11042-024-20262-3
Jayaramu HMaji SYahia H(2024)Personalized Multi‐User‐Based Movie and Video Recommender SystemSupervised and Unsupervised Data Engineering for Multimedia Data10.1002/9781119786443.ch7(149-175)Online publication date: Apr-2024
https://doi.org/10.1002/9781119786443.ch7
Bansal SGowda KKumar N(2023)A Hybrid Deep Neural Network for Multimodal Personalized Hashtag RecommendationIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.318430710:5(2439-2459)Online publication date: Oct-2023
https://doi.org/10.1109/TCSS.2022.3184307
Balineni SAndreopoulos W(2023)Graph deep learning hashtag recommender for reels2023 IEEE Ninth International Conference on Big Data Computing Service and Applications (BigDataService)10.1109/BigDataService58306.2023.00023(119-126)Online publication date: Jul-2023
https://doi.org/10.1109/BigDataService58306.2023.00023
Aftab SRamampiaro HLangseth HRuocco M(2023)Deep Contextual Grid Triplet Network for Context-Aware RecommendationIEEE Access10.1109/ACCESS.2023.331047011(97522-97537)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3310470
Xu PXia MXiao LLiu HLiu BJing LYu J(2023)Textual tag recommendation with multi-tag topical attentionNeurocomputing10.1016/j.neucom.2023.03.051537(73-84)Online publication date: Jun-2023
https://doi.org/10.1016/j.neucom.2023.03.051
Sohafi-Bonab JHosseinzadeh Aghdam MMajidzadeh K(2023)DCARS: Deep context-aware recommendation system based on session latent contextApplied Soft Computing10.1016/j.asoc.2023.110416143(110416)Online publication date: Aug-2023
https://doi.org/10.1016/j.asoc.2023.110416
Shirkhani SMokayed HSaini RChai H(2023)Study of AI-Driven Fashion Recommender SystemsSN Computer Science10.1007/s42979-023-01932-94:5Online publication date: 5-Jul-2023
https://doi.org/10.1007/s42979-023-01932-9
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents