Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2964284.2984068acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

ConTagNet: Exploiting User Context for Image Tag Recommendation

Published: 01 October 2016 Publication History

Abstract

In recent years, deep convolutional neural networks have shown great success in single-label image classification. However, images usually have multiple labels associated with them which may correspond to different objects or actions present in the image. In addition, a user assigns tags to a photo not merely based on the visual content but also the context in which the photo has been captured. Inspired by this, we propose a deep neural network which can predict multiple tags for an image based on the content as well as the context in which the image is captured. The proposed model can be trained end-to-end and solves a multi-label classification problem. We evaluate the model on a dataset of 1,965,232 images which is drawn from the YFCC100M dataset provided by the organizers of Yahoo-Flickr Grand Challenge. We observe a significant improvement in the prediction accuracy after integrating user-context and the proposed model performs very well in the Grand Challenge.

References

[1]
X. Chen, H. Fang, T. Lin, R. Vedantam, S. Gupta, P. Dollár, and C. L. Zitnick. Microsoft COCO captions: Data collection and evaluation server. CoRR, abs/1504.00325, 2015.
[2]
X. Chen and C. L. Zitnick. Learning a recurrent visual representation for image caption generation. arXiv preprint arXiv:1411.5654, 2014.
[3]
T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM international conference on image and video retrieval, page 48. ACM, 2009.
[4]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248--255. IEEE, 2009.
[5]
M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303--338, 2010.
[6]
A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. Devise: A deep visual-semantic embedding model. In Advances in neural information processing systems, pages 2121--2129, 2013.
[7]
N. Ghamrawi and A. McCallum. Collective multi-label classification. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 195--200. ACM, 2005.
[8]
Y. Gong, Y. Jia, T. Leung, A. Toshev, and S. Ioffe. Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894, 2013.
[9]
Y. Gong, Q. Ke, M. Isard, and S. Lazebnik. A multi-view embedding space for modeling internet images, tags, and their semantics. International journal of computer vision, 106(2):210--233, 2014.
[10]
M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In 2009 IEEE 12th International Conference on Computer Vision, pages 309--316, Sept 2009.
[11]
Y. Guo and S. Gu. Multi-label classification using conditional dependency networks. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, volume 22, page 1300, 2011.
[12]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
[13]
J. Johnson, L. Ballan, and L. Fei-Fei. Love thy neighbors: Image annotation by exploiting image metadata. In Proceedings of the IEEE International Conference on Computer Vision, pages 4624--4632, 2015.
[14]
J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[15]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.
[16]
X. Li, F. Zhao, and Y. Guo. Multi-label image classification with a probabilistic label enhancement model. Proc. Uncertainty in Artificial Intell, 2014.
[17]
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer Vision, pages 740--755. Springer, 2014.
[18]
A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In Proceedings of the 10th European Conference on Computer Vision: Part III, ECCV '08, pages 316--329, Berlin, Heidelberg, 2008. Springer-Verlag.
[19]
J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410.1090, 2014.
[20]
J. Read, B. Pfahringer, G. Holmes, and E. Frank. Classifier chains for multi-label classification. Machine learning, 85(3):333--359, 2011.
[21]
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[22]
N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929--1958, 2014.
[23]
Y.-C. Su, T.-H. Chiu, G.-L. Wu, C.-Y. Yeh, F. Wu, and W. Hsu. Flickr-tag prediction using multi-modal fusion and meta information. In Proceedings of the 21st ACM international conference on Multimedia, pages 353--356. ACM, 2013.
[24]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1--9, 2015.
[25]
B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.-J. Li. Yfcc100m: The new data in multimedia research. Commun. ACM, 59(2):64--73, Jan. 2016.
[26]
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3156--3164, 2015.
[27]
J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, and W. Xu. Cnn-rnn: A unified framework for multi-label image classification. arXiv preprint arXiv:1604.04573, 2016.
[28]
Y. Wei, W. Xia, J. Huang, B. Ni, J. Dong, Y. Zhao, and S. Yan. Cnn: Single-label to multi-label. arXiv preprint arXiv:1406.5726, 2014.
[29]
J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: learning to rank with joint word-image embeddings. Machine Learning, 81(1):21--35, 2010.
[30]
J. Weston, S. Bengio, and N. Usunier. Wsabie: scaling up to large vocabulary image annotation. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, pages 2764--2770. AAAI Press, 2011.
[31]
X. Xue, W. Zhang, J. Zhang, B. Wu, J. Fan, and Y. Lu. Correlative multi-label multi-instance image annotation. In 2011 International Conference on Computer Vision, pages 651--658. IEEE, 2011.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '16: Proceedings of the 24th ACM international conference on Multimedia
October 2016
1542 pages
ISBN:9781450336031
DOI:10.1145/2964284
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. convolutional neural networks
  2. image tags
  3. multi-label
  4. user context

Qualifiers

  • Research-article

Funding Sources

  • National Research Foundation Prime Minister's Office Singapore

Conference

MM '16
Sponsor:
MM '16: ACM Multimedia Conference
October 15 - 19, 2016
Amsterdam, The Netherlands

Acceptance Rates

MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)5
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CNNRecEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108062133:PAOnline publication date: 1-Jul-2024
  • (2024)Recent trends in recommender systems: a surveyInternational Journal of Multimedia Information Retrieval10.1007/s13735-024-00349-113:4Online publication date: 10-Oct-2024
  • (2024)Deep learning approaches to address cold start and long tail challenges in recommendation systems: a systematic reviewMultimedia Tools and Applications10.1007/s11042-024-20262-3Online publication date: 16-Oct-2024
  • (2024)Personalized Multi‐User‐Based Movie and Video Recommender SystemSupervised and Unsupervised Data Engineering for Multimedia Data10.1002/9781119786443.ch7(149-175)Online publication date: Apr-2024
  • (2023)A Hybrid Deep Neural Network for Multimodal Personalized Hashtag RecommendationIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.318430710:5(2439-2459)Online publication date: Oct-2023
  • (2023)Graph deep learning hashtag recommender for reels2023 IEEE Ninth International Conference on Big Data Computing Service and Applications (BigDataService)10.1109/BigDataService58306.2023.00023(119-126)Online publication date: Jul-2023
  • (2023)Deep Contextual Grid Triplet Network for Context-Aware RecommendationIEEE Access10.1109/ACCESS.2023.331047011(97522-97537)Online publication date: 2023
  • (2023)Textual tag recommendation with multi-tag topical attentionNeurocomputing10.1016/j.neucom.2023.03.051537(73-84)Online publication date: Jun-2023
  • (2023)DCARS: Deep context-aware recommendation system based on session latent contextApplied Soft Computing10.1016/j.asoc.2023.110416143(110416)Online publication date: Aug-2023
  • (2023)Study of AI-Driven Fashion Recommender SystemsSN Computer Science10.1007/s42979-023-01932-94:5Online publication date: 5-Jul-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media