Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2018936.2018962dlproceedingsArticle/Chapter ViewAbstractPublication PagesconllConference Proceedingsconference-collections
research-article
Free access

Composing simple image descriptions using web-scale n-grams

Published: 23 June 2011 Publication History

Abstract

Studying natural language, and especially how people describe the world around them can help us better understand the visual world. In turn, it can also help us in the quest to generate natural language that describes this world in a human manner. We present a simple yet effective approach to automatically compose image descriptions given computer vision based inputs and using web-scale n-grams. Unlike most previous work that summarizes or retrieves pre-existing text relevant to an image, our method composes sentences entirely from scratch. Experimental results indicate that it is viable to generate simple textual descriptions that are pertinent to the specific content of an image, while permitting creativity in the description -- making for more human-like annotations than previous approaches.

References

[1]
A. Aker and R. Gaizauskas. 2010. Generating image descriptions using dependency relational patterns. In ACL.
[2]
K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M. Jordan. 2003. Matching words and pictures. JMLR, 3:1107--1135.
[3]
Songsak Channarukul, Susan W. McRoy, and Syed S. Ali. 2003. Doghed: a template-based generator for multimodal dialog systems targeting heterogeneous devices. In NAACL.
[4]
Michael Chisholm and Prasad Tadepalli. 2002. Learning decision rules by randomized iterative local search. In ICML, pages 75--82.
[5]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR.
[6]
A. Farhadi, I. Endres, D. Hoiem, and D. A. Forsyth. 2009. Describing objects by their attributes. In CVPR.
[7]
A. Farhadi, M Hejrati, A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. A. Forsyth. 2010. Every picture tells a story: generating sentences for images. In ECCV.
[8]
P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. 2010. Object detection with discriminatively trained part based models. tPAMI, Sept.
[9]
Y. Feng and M. Lapata. 2010a. How many words is a picture worth? automatic caption generation for news images. In ACL.
[10]
Yansong Feng and Mirella Lapata. 2010b. Topic models for image annotation and text illustration. In HLT.
[11]
Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pages 423--430. Association for Computational Linguistics.
[12]
Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, and Tamara L Berg. 2011. Babytalk: Understanding and generating simple image descriptions. In CVPR.
[13]
Chee Wee Leong, Rada Mihalcea, and Samer Hassan. 2010. Text mining for automatic image tagging. In COLING.
[14]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation.
[15]
Katerina Pastra, Horacio Saggion, and Yorick Wilks. 2003. Nlp for indexing and retrieval of captioned photographs. In EACL.
[16]
Michael White and Claire Cardie. 2002. Selecting sentences for multidocument summaries using randomized local search. In ACL Workshop on Automatic Summarization.
[17]
B. Z. Yao, Xiong Yang, Liang Lin, Mun Wai Lee, and Song-Chun Zhu. 2010. I2t: Image parsing to text description. Proc. IEEE, 98(8).
[18]
Liang Zhou and Eduard Hovy. 2004. Template-filtered headline summarization. In Text Summarization Branches Out: Pr ACL-04 Wkshp, July.

Cited By

View all
  • (2024)Noise-aware image captioning with progressively exploring mismatched wordsProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i11.29097(12091-12099)Online publication date: 20-Feb-2024
  • (2024)Underwater Image Captioning Based on Feature FusionProceedings of the 2024 7th International Conference on Image and Graphics Processing10.1145/3647649.3647700(322-326)Online publication date: 19-Jan-2024
  • (2023)GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in HindiACM Transactions on Asian and Low-Resource Language Information Processing10.1145/362293622:10(1-16)Online publication date: 13-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
CoNLL '11: Proceedings of the Fifteenth Conference on Computational Natural Language Learning
June 2011
270 pages
ISBN:9781932432923

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 23 June 2011

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)92
  • Downloads (Last 6 weeks)16
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Noise-aware image captioning with progressively exploring mismatched wordsProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i11.29097(12091-12099)Online publication date: 20-Feb-2024
  • (2024)Underwater Image Captioning Based on Feature FusionProceedings of the 2024 7th International Conference on Image and Graphics Processing10.1145/3647649.3647700(322-326)Online publication date: 19-Jan-2024
  • (2023)GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in HindiACM Transactions on Asian and Low-Resource Language Information Processing10.1145/362293622:10(1-16)Online publication date: 13-Oct-2023
  • (2023)ROME: Testing Image Captioning Systems via Recursive Object MeltingProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598094(766-778)Online publication date: 12-Jul-2023
  • (2023)Dynamic Convolution-based Encoder-Decoder Framework for Image Captioning in HindiACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357389122:4(1-18)Online publication date: 24-Mar-2023
  • (2022)An Object Localization-based Dense Image Captioning Framework in HindiACM Transactions on Asian and Low-Resource Language Information Processing10.1145/355839122:2(1-15)Online publication date: 27-Dec-2022
  • (2021)Trends in Integration of Vision and Language ResearchJournal of Artificial Intelligence Research10.1613/jair.1.1168871(1183-1317)Online publication date: 10-Sep-2021
  • (2021)Efficient Channel Attention Based Encoder–Decoder Approach for Image Captioning in HindiACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348359721:3(1-17)Online publication date: 13-Dec-2021
  • (2021)Learning to Understand Traffic SignsProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475362(2076-2084)Online publication date: 17-Oct-2021
  • (2021)A Hindi Image Caption Generation Framework Using Deep LearningACM Transactions on Asian and Low-Resource Language Information Processing10.1145/343224620:2(1-19)Online publication date: 15-Mar-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media