research-article

Free access

Composing simple image descriptions using web-scale n-grams

Authors:

Girish Kulkarni,

Tamara L. Berg,

Alexander C. Berg,

Yejin ChoiAuthors Info & Claims

CoNLL '11: Proceedings of the Fifteenth Conference on Computational Natural Language Learning

Pages 220 - 228

Published: 23 June 2011 Publication History

Abstract

Studying natural language, and especially how people describe the world around them can help us better understand the visual world. In turn, it can also help us in the quest to generate natural language that describes this world in a human manner. We present a simple yet effective approach to automatically compose image descriptions given computer vision based inputs and using web-scale n-grams. Unlike most previous work that summarizes or retrieves pre-existing text relevant to an image, our method composes sentences entirely from scratch. Experimental results indicate that it is viable to generate simple textual descriptions that are pertinent to the specific content of an image, while permitting creativity in the description -- making for more human-like annotations than previous approaches.

References

[1]

A. Aker and R. Gaizauskas. 2010. Generating image descriptions using dependency relational patterns. In ACL.

Digital Library

[2]

K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M. Jordan. 2003. Matching words and pictures. JMLR, 3:1107--1135.

Digital Library

[3]

Songsak Channarukul, Susan W. McRoy, and Syed S. Ali. 2003. Doghed: a template-based generator for multimodal dialog systems targeting heterogeneous devices. In NAACL.

Digital Library

[4]

Michael Chisholm and Prasad Tadepalli. 2002. Learning decision rules by randomized iterative local search. In ICML, pages 75--82.

Digital Library

[5]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR.

[6]

A. Farhadi, I. Endres, D. Hoiem, and D. A. Forsyth. 2009. Describing objects by their attributes. In CVPR.

[7]

A. Farhadi, M Hejrati, A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. A. Forsyth. 2010. Every picture tells a story: generating sentences for images. In ECCV.

Digital Library

[8]

P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. 2010. Object detection with discriminatively trained part based models. tPAMI, Sept.

Digital Library

[9]

Y. Feng and M. Lapata. 2010a. How many words is a picture worth? automatic caption generation for news images. In ACL.

Digital Library

[10]

Yansong Feng and Mirella Lapata. 2010b. Topic models for image annotation and text illustration. In HLT.

Digital Library

[11]

Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pages 423--430. Association for Computational Linguistics.

Digital Library

[12]

Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, and Tamara L Berg. 2011. Babytalk: Understanding and generating simple image descriptions. In CVPR.

Digital Library

[13]

Chee Wee Leong, Rada Mihalcea, and Samer Hassan. 2010. Text mining for automatic image tagging. In COLING.

[14]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation.

[15]

Katerina Pastra, Horacio Saggion, and Yorick Wilks. 2003. Nlp for indexing and retrieval of captioned photographs. In EACL.

Digital Library

[16]

Michael White and Claire Cardie. 2002. Selecting sentences for multidocument summaries using randomized local search. In ACL Workshop on Automatic Summarization.

Digital Library

[17]

B. Z. Yao, Xiong Yang, Liang Lin, Mun Wai Lee, and Song-Chun Zhu. 2010. I2t: Image parsing to text description. Proc. IEEE, 98(8).

[18]

Liang Zhou and Eduard Hovy. 2004. Template-filtered headline summarization. In Text Summarization Branches Out: Pr ACL-04 Wkshp, July.

Cited By

Fu ZSong KZhou LYang YWooldridge MDy JNatarajan S(2024)Noise-aware image captioning with progressively exploring mismatched wordsProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i11.29097(12091-12099)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i11.29097
Li LWei YRen P(2024)Underwater Image Captioning Based on Feature FusionProceedings of the 2024 7th International Conference on Image and Graphics Processing10.1145/3647649.3647700(322-326)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3647649.3647700
Mishra SChakraborty SSaha SBhattacharyya P(2023)GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in HindiACM Transactions on Asian and Low-Resource Language Information Processing10.1145/362293622:10(1-16)Online publication date: 13-Oct-2023
https://dl.acm.org/doi/10.1145/3622936
Show More Cited By

Index Terms

Composing simple image descriptions using web-scale n-grams
1. Applied computing
  1. Arts and humanities
    1. Language translation
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning

Recommendations

Large Scale Retrieval and Generation of Image Descriptions

What is the story of an image? What is the relationship between pictures, language, and information we can extract using state of the art computational recognition systems? In an attempt to address both of these questions, we explore methods for ...
Context-Aware Image Descriptions for Web Accessibility
ASSETS '24: Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility

Blind and low vision (BLV) internet users access images on the web via text descriptions. New vision-to-language models such as GPT-V, Gemini, and LLaVa can now provide detailed image descriptions on-demand. While prior research and guidelines state that ...
Using style to understand descriptions of software architecture

The software architecture of most systems is described informally and diagrammatically. In order for these descriptions to be meaningful at all, figures are understood by interpreting the boxes and lines in specific, conventionalized ways[5]. The ...

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

CoNLL '11: Proceedings of the Fifteenth Conference on Computational Natural Language Learning

June 2011

270 pages

ISBN:9781932432923

Program Chairs:
Sharon Goldwater
University of Edinburgh, United Kingdom
,
Christopher Manning
Stanford University

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 23 June 2011

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

45
Total Citations
View Citations
722
Total Downloads

Downloads (Last 12 months)92
Downloads (Last 6 weeks)16

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fu ZSong KZhou LYang YWooldridge MDy JNatarajan S(2024)Noise-aware image captioning with progressively exploring mismatched wordsProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i11.29097(12091-12099)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i11.29097
Li LWei YRen P(2024)Underwater Image Captioning Based on Feature FusionProceedings of the 2024 7th International Conference on Image and Graphics Processing10.1145/3647649.3647700(322-326)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3647649.3647700
Mishra SChakraborty SSaha SBhattacharyya P(2023)GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in HindiACM Transactions on Asian and Low-Resource Language Information Processing10.1145/362293622:10(1-16)Online publication date: 13-Oct-2023
https://dl.acm.org/doi/10.1145/3622936
Yu BZhong ZLi JYang YHe SHe PJust RFraser G(2023)ROME: Testing Image Captioning Systems via Recursive Object MeltingProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598094(766-778)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3597926.3598094
Mishra SSinha SSaha SBhattacharyya P(2023)Dynamic Convolution-based Encoder-Decoder Framework for Image Captioning in HindiACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357389122:4(1-18)Online publication date: 24-Mar-2023
https://dl.acm.org/doi/10.1145/3573891
Mishra SHarshit Saha SBhattacharyya P(2022)An Object Localization-based Dense Image Captioning Framework in HindiACM Transactions on Asian and Low-Resource Language Information Processing10.1145/355839122:2(1-15)Online publication date: 27-Dec-2022
https://dl.acm.org/doi/10.1145/3558391
Mogadala AKalimuthu MKlakow D(2021)Trends in Integration of Vision and Language ResearchJournal of Artificial Intelligence Research10.1613/jair.1.1168871(1183-1317)Online publication date: 10-Sep-2021
https://dl.acm.org/doi/10.1613/jair.1.11688
Mishra SRai GSaha SBhattacharyya P(2021)Efficient Channel Attention Based Encoder–Decoder Approach for Image Captioning in HindiACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348359721:3(1-17)Online publication date: 13-Dec-2021
https://dl.acm.org/doi/10.1145/3483597
Guo YFeng WYin FXue TMei SLiu CShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Learning to Understand Traffic SignsProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475362(2076-2084)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475362
Mishra SDhir RSaha SBhattacharyya P(2021)A Hindi Image Caption Generation Framework Using Deep LearningACM Transactions on Asian and Low-Resource Language Information Processing10.1145/343224620:2(1-19)Online publication date: 15-Mar-2021
https://dl.acm.org/doi/10.1145/3432246
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten