research-article

Images2Poem: Generating Chinese Poetry from Image Streams

Authors:

Xiaojun Wan, and

Zongming GuoAuthors Info & Claims

MM '18: Proceedings of the 26th ACM international conference on Multimedia

October 2018

Pages 1967 - 1975

https://doi.org/10.1145/3240508.3241910

Published: 15 October 2018 Publication History

Abstract

Natural language generation from visual inputs has attracted extensive research attention recently. Generating poetry from visual content is an interesting but very challenging task. We propose and address the new multimedia task of generating classical Chinese poetry from image streams. In this paper, we propose an Images2Poem model with a selection mechanism and an adaptive self-attention mechanism for the problem. The model first selects representative images to summarize the image stream. During decoding, it adaptively pays attention to the information from either source-side image stream or target-side previously generated characters. It jointly summarizes the images and generates relevant, high-quality poetry from image streams. Experimental results demonstrate the effectiveness of the proposed approach. Our model outperforms baselines in different human evaluation metrics.

References

[1]

Yi Bin, Yang Yang, Jie Zhou, Zi Huang, and Heng Tao Shen. 2017. Adaptively Attending to Visual Attributes and Linguistic Knowledge for Captioning ACM Multimedia Conference (ACM MM). 1345--1353.

Digital Library

[2]

Jianpeng Cheng and Mirella Lapata. 2016. Neural Summarization by Extracting Sentences and Words. In ACL.

[3]

Michał Daniluk, Tim Rocktäschel, Johannes Welbl, and Sebastian Riedel. 2017. Frustratingly Short Attention Spans in Neural Language Modeling. In ICLR.

[4]

Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, and Dhruv Batra. 2016. Visual Dialog. CoRR abs/1611.08669 (2016).

[5]

Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, and Trevor Darrell. 2015. Long-term Recurrent Convolutional Networks for Visual Recognition and Description. In CVPR.

[6]

Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh K. Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, and Geoffrey Zweig. 2015. From captions to visual concepts and back. In CVPR.

[7]

Manaal Faruqui and Chris Dyer. 2014. Improving Vector Space Word Representations Using Multilingual Correlation. In Conference of the European Chapter of the Association for Computational Linguistics (EACL).

[8]

Michael Gygli, Helmut Grabner, and Luc Van Gool. 2015. Video summarization by learning submodular mixtures of objectives. In CVPR. 3090--3098.

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR.

[10]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997).

Digital Library

[11]

Andrej Karpathy and Li Fei-Fei. 2015. Deep Visual-Semantic Alignments for Generating Image Descriptions. In CVPR.

[12]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. Computer Science (2014).

[13]

John Lee, Yin Hei Kong, and Mengqi Luo. 2018. Syntactic patterns in classical Chinese poems: A quantitative study. Digital Scholarship in the Humanities 33, 1 (2018), 82--95.

[14]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European Conference on Computer Vision. Springer.

[15]

Zhouhan Lin, Minwei Feng, Cicero Nogueira Dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A Structured Self-attentive Sentence Embedding. In ICLR.

[16]

Chia-Wei Liu, Ryan Lowe, Iulian V. Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. arXiv preprint arXiv:1603.08023 (2016).

[17]

Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. 2017. Knowing when to look: Adaptive attention via A visual sentinel for image captioning, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). arXiv preprint arXiv:1612.01887.

[18]

Minh Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In EMNLP.

[19]

Hongyuan Mei, Mohit Bansal, and Matthew R. Walter. 2016. What to talk about and how? selective generation using lstms with coarse-to-fine alignment, In NAACL-HLT. arXiv preprint arXiv:1509.00838.

[20]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv prNeprint arXiv:1301.3781 (2013).

[21]

Cesc C. Park and Gunhee Kim. 2015. Expressing an Image Stream with a Sequence of Natural Sentences. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). 73--81.

Digital Library

[22]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. 2014. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2014), 211--252.

Digital Library

[23]

Pinaki Sinha, Hamed Pirsiavash, and Ramesh Jain. 2009. Personal photo album summarization. In ACM Conference on Multimedia (ACM MM). 1131--1132.

Digital Library

[24]

Jiwei Tan, Xiaojun Wan, and Jianguo Xiao. 2017. Abstractive document summarization with a graph-based attentional neural model. In ACL.

[25]

Shyam Upadhyay, Manaal Faruqui, Chris Dyer, and Dan Roth. 2016. Cross-lingual Models of Word Embeddings: An Empirical Comparison. In ACL.

[26]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems.

[27]

Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2015. Sequence to Sequence - Video to Text. In ICCV.

Digital Library

[28]

Oriol Vinyals and Alexander Toshev. 2015. Show and Tell: A Neural Image Caption Generator. In CVPR.

[29]

Qixin Wang, Tianyi Luo, and Dong Wang. 2016. Can Machine Generate Traditional Chinese Poetry? A Feigenbaum Test. CoRR abs/1606.05829 (2016). arxiv: 1606.05829 http://arxiv.org/abs/1606.05829

[30]

Qixin Wang, Tianyi Luo, Dong Wang, and Chao Xing. 2016. Chinese song iambics generation with neural attention-based model. In IJCAI.

Digital Library

[31]

Zhe Wang, Wei He, Hua Wu, Haiyang Wu, Wei Li, Haifeng Wang, and Enhong Chen. 2016. Chinese poetry generation with planning based neural network. In International Conference on Computational Linguistics (COLING).

[32]

Lesly Miculicich Werlen, Nikolaos Pappas, Dhananjay Ram, and Andrei Popescu-Belis. 2018. Self-Attentive Residual Decoder for Neural Machine Translation. In NAACL-HLT.

[33]

Pak-kwong Wong and Chorkin Chan. 1996. Chinese word segmentation based on maximum matching and word binding force Proceedings of the 16th conference on Computational linguistics-Volume 1. Association for Computational Linguistics.

Digital Library

[34]

Qi Wu, Chunhua Shen, Lingqiao Liu, Anthony Dick, and Anton Van Den Hengel. 2016. What Value Do Explicit High Level Concepts Have in Vision to Language Problems? In CVPR.

[35]

Kelvin Xu, Aaron Courville, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In ICML.

Digital Library

[36]

L. Xu, L. Jiang, C. Qin, Z. Wang, and D. Du. 2018. How Images Inspire Poems: Generating Classical Chinese Poetry from Images with Memory Networks, In AAAI. ArXiv e-prints. arxiv: cs.CL/1803.02994

[37]

Rui Yan. 2016. i, Poet: Automatic Poetry Composition through Recurrent Neural Networks with Iterative Polishing Schema. In IJCAI.

Digital Library

[38]

Rui Yan, Han Jiang, Mirella Lapata, Shou-De Lin, Xueqiang Lv, and Xiaoming Li. 2013. i, Poet: Automatic Chinese Poetry Composition through a Generative Summarization Framework under Constrained Optimization. In IJCAI.

Digital Library

[39]

Jin-ge Yao, Xiaojun Wan, and Jianguo Xiao. 2017. Recent advances in document summarization. Knowledge and Information Systems 53, 2 (2017), 297--336.

Digital Library

[40]

Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, and Tao Mei. 2017. Boosting Image Captioning with Attributes. In IEEE International Conference on Computer Vision (ICCV).

[41]

Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. Image Captioning with Semantic Attention. In CVPR.

[42]

Haonan Yu, Jiang Wang, Zhiheng Huang, Yi Yang, and Wei Xu. 2016. Video paragraph captioning using hierarchical recurrent neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4584--4593.

[43]

Licheng Yu, Mohit Bansal, and Tamara L. Berg. 2017. Hierarchically-Attentive RNN for Album Summarization and Storytelling. In EMNLP.

[44]

Jiyuan Zhang, Yang Feng, Dong Wang, Yang Wang, Andrew Abel, Shiyue Zhang, and Andi Zhang. 2017. Flexible and creative chinese poetry generation using neural memory. In ACL.

[45]

Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016. Video summarization with long short-term memory. In European conference on computer vision. Springer, 766--782.

[46]

Xingxing Zhang and Mirella Lapata. 2014. Chinese poetry generation with recurrent neural networks Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).

[47]

Cheng-Le Zhou, Wei You, and Xiaojun Ding. 2010. Genetic algorithm and its implementation of automatic generation of chinese songci. Journal of Software 21, 3 (2010).

[48]

Qingyu Zhou, Nan Yang, Furu Wei, and Ming Zhou. 2017. Selective encoding for abstractive sentence summarization, In ACL. arXiv preprint arXiv:1704.07073.

Cited By

Chen JHuang KZhu XQiu XWang HQin X(2023)Poetry4painting: Diversified poetry generation for large-size ancient paintings based on data augmentationComputers & Graphics10.1016/j.cag.2023.07.029116(206-215)Online publication date: Nov-2023
https://doi.org/10.1016/j.cag.2023.07.029
Cao QChen XSong RJiang HYang GCao ZMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Multi-Modal Experience Inspired AI CreationProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548189(1445-1454)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3548189
Cui M(2022)DRIIS: Research on Automatic Recognition of Artistic Conception of Classical Poems Based on Deep LearningInternational Journal of Cooperative Information Systems10.1142/S021884302250001031:01n02Online publication date: 15-Oct-2022
https://doi.org/10.1142/S0218843022500010
Show More Cited By

Index Terms

Images2Poem: Generating Chinese Poetry from Image Streams
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
    2. Natural language processing
      1. Natural language generation

Recommendations

Image to Modern Chinese Poetry Creation via a Constrained Topic-aware Model

Artificial creativity has attracted increasing research attention in the field of multimedia and artificial intelligence. Despite the promising work on poetry/painting/music generation, creating modern Chinese poetry from images, which can significantly ...
Read More
Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training
MM '18: Proceedings of the 26th ACM international conference on Multimedia

Automatic generation of natural language from images has attracted extensive attention. In this paper, we take one step further to investigate generation of poetic language (with multiple lines) to an image for automatic poetry creation. This task ...
Read More
Images2Poem in different contexts with Dual‐CharRNN
Abstract
Image to caption has attracted extensive research attention recently. However, image to poetry, especially Chinese classical poetry, is much more challenging. Previous works mainly focus on generating coherent poetry without taking the contexts of ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '18: Proceedings of the 26th ACM international conference on Multimedia

October 2018

2167 pages

ISBN:9781450356657

DOI:10.1145/3240508

General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 22 - 26, 2018

Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
492
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)5

Other Metrics

View Author Metrics

Citations

Cited By

Chen JHuang KZhu XQiu XWang HQin X(2023)Poetry4painting: Diversified poetry generation for large-size ancient paintings based on data augmentationComputers & Graphics10.1016/j.cag.2023.07.029116(206-215)Online publication date: Nov-2023
https://doi.org/10.1016/j.cag.2023.07.029
Cao QChen XSong RJiang HYang GCao ZMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Multi-Modal Experience Inspired AI CreationProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548189(1445-1454)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3548189
Cui M(2022)DRIIS: Research on Automatic Recognition of Artistic Conception of Classical Poems Based on Deep LearningInternational Journal of Cooperative Information Systems10.1142/S021884302250001031:01n02Online publication date: 15-Oct-2022
https://doi.org/10.1142/S0218843022500010
Yan JXie YLuan X(2022)Images2Poem in different contexts with Dual‐CharRNNCAAI Transactions on Intelligence Technology10.1049/cit2.120897:4(685-694)Online publication date: 25-Mar-2022
https://doi.org/10.1049/cit2.12089
Feng YChen JHuang KWong JYe HZhang WZhu RLuo XChen W(2021)iPoet: interactive painting poetry creation with visual multimodal analysisJournal of Visualization10.1007/s12650-021-00780-0Online publication date: 19-Nov-2021
https://doi.org/10.1007/s12650-021-00780-0
Luo YHuang ZZhang ZWang ZLi JYang YAmsaleg LHuet BLarson MGravier GHung HNgo CTsang Ooi W(2019)Curiosity-driven Reinforcement Learning for Diverse Visual Paragraph GenerationProceedings of the 27th ACM International Conference on Multimedia10.1145/3343031.3350961(2341-2350)Online publication date: 15-Oct-2019
https://dl.acm.org/doi/10.1145/3343031.3350961
Yeh WChang YLi YChang W(2019)Rhyming Knowledge-Aware Deep Neural Network for Chinese Poetry Generation2019 International Conference on Machine Learning and Cybernetics (ICMLC)10.1109/ICMLC48188.2019.8949208(1-6)Online publication date: Jul-2019
https://doi.org/10.1109/ICMLC48188.2019.8949208
Liu LTang JWan XGuo Z(2019)Generating Diverse and Descriptive Image Captions Using Visual Paraphrases2019 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV.2019.00434(4239-4248)Online publication date: Oct-2019
https://doi.org/10.1109/ICCV.2019.00434

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents