Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3240508.3241910acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Images2Poem: Generating Chinese Poetry from Image Streams

Published: 15 October 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Natural language generation from visual inputs has attracted extensive research attention recently. Generating poetry from visual content is an interesting but very challenging task. We propose and address the new multimedia task of generating classical Chinese poetry from image streams. In this paper, we propose an Images2Poem model with a selection mechanism and an adaptive self-attention mechanism for the problem. The model first selects representative images to summarize the image stream. During decoding, it adaptively pays attention to the information from either source-side image stream or target-side previously generated characters. It jointly summarizes the images and generates relevant, high-quality poetry from image streams. Experimental results demonstrate the effectiveness of the proposed approach. Our model outperforms baselines in different human evaluation metrics.

    References

    [1]
    Yi Bin, Yang Yang, Jie Zhou, Zi Huang, and Heng Tao Shen. 2017. Adaptively Attending to Visual Attributes and Linguistic Knowledge for Captioning ACM Multimedia Conference (ACM MM). 1345--1353.
    [2]
    Jianpeng Cheng and Mirella Lapata. 2016. Neural Summarization by Extracting Sentences and Words. In ACL.
    [3]
    Michał Daniluk, Tim Rocktäschel, Johannes Welbl, and Sebastian Riedel. 2017. Frustratingly Short Attention Spans in Neural Language Modeling. In ICLR.
    [4]
    Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, and Dhruv Batra. 2016. Visual Dialog. CoRR abs/1611.08669 (2016).
    [5]
    Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, and Trevor Darrell. 2015. Long-term Recurrent Convolutional Networks for Visual Recognition and Description. In CVPR.
    [6]
    Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh K. Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, and Geoffrey Zweig. 2015. From captions to visual concepts and back. In CVPR.
    [7]
    Manaal Faruqui and Chris Dyer. 2014. Improving Vector Space Word Representations Using Multilingual Correlation. In Conference of the European Chapter of the Association for Computational Linguistics (EACL).
    [8]
    Michael Gygli, Helmut Grabner, and Luc Van Gool. 2015. Video summarization by learning submodular mixtures of objectives. In CVPR. 3090--3098.
    [9]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR.
    [10]
    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997).
    [11]
    Andrej Karpathy and Li Fei-Fei. 2015. Deep Visual-Semantic Alignments for Generating Image Descriptions. In CVPR.
    [12]
    Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. Computer Science (2014).
    [13]
    John Lee, Yin Hei Kong, and Mengqi Luo. 2018. Syntactic patterns in classical Chinese poems: A quantitative study. Digital Scholarship in the Humanities 33, 1 (2018), 82--95.
    [14]
    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European Conference on Computer Vision. Springer.
    [15]
    Zhouhan Lin, Minwei Feng, Cicero Nogueira Dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A Structured Self-attentive Sentence Embedding. In ICLR.
    [16]
    Chia-Wei Liu, Ryan Lowe, Iulian V. Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. arXiv preprint arXiv:1603.08023 (2016).
    [17]
    Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. 2017. Knowing when to look: Adaptive attention via A visual sentinel for image captioning, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). arXiv preprint arXiv:1612.01887.
    [18]
    Minh Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In EMNLP.
    [19]
    Hongyuan Mei, Mohit Bansal, and Matthew R. Walter. 2016. What to talk about and how? selective generation using lstms with coarse-to-fine alignment, In NAACL-HLT. arXiv preprint arXiv:1509.00838.
    [20]
    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv prNeprint arXiv:1301.3781 (2013).
    [21]
    Cesc C. Park and Gunhee Kim. 2015. Expressing an Image Stream with a Sequence of Natural Sentences. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). 73--81.
    [22]
    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. 2014. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2014), 211--252.
    [23]
    Pinaki Sinha, Hamed Pirsiavash, and Ramesh Jain. 2009. Personal photo album summarization. In ACM Conference on Multimedia (ACM MM). 1131--1132.
    [24]
    Jiwei Tan, Xiaojun Wan, and Jianguo Xiao. 2017. Abstractive document summarization with a graph-based attentional neural model. In ACL.
    [25]
    Shyam Upadhyay, Manaal Faruqui, Chris Dyer, and Dan Roth. 2016. Cross-lingual Models of Word Embeddings: An Empirical Comparison. In ACL.
    [26]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems.
    [27]
    Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2015. Sequence to Sequence - Video to Text. In ICCV.
    [28]
    Oriol Vinyals and Alexander Toshev. 2015. Show and Tell: A Neural Image Caption Generator. In CVPR.
    [29]
    Qixin Wang, Tianyi Luo, and Dong Wang. 2016. Can Machine Generate Traditional Chinese Poetry? A Feigenbaum Test. CoRR abs/1606.05829 (2016). arxiv: 1606.05829 http://arxiv.org/abs/1606.05829
    [30]
    Qixin Wang, Tianyi Luo, Dong Wang, and Chao Xing. 2016. Chinese song iambics generation with neural attention-based model. In IJCAI.
    [31]
    Zhe Wang, Wei He, Hua Wu, Haiyang Wu, Wei Li, Haifeng Wang, and Enhong Chen. 2016. Chinese poetry generation with planning based neural network. In International Conference on Computational Linguistics (COLING).
    [32]
    Lesly Miculicich Werlen, Nikolaos Pappas, Dhananjay Ram, and Andrei Popescu-Belis. 2018. Self-Attentive Residual Decoder for Neural Machine Translation. In NAACL-HLT.
    [33]
    Pak-kwong Wong and Chorkin Chan. 1996. Chinese word segmentation based on maximum matching and word binding force Proceedings of the 16th conference on Computational linguistics-Volume 1. Association for Computational Linguistics.
    [34]
    Qi Wu, Chunhua Shen, Lingqiao Liu, Anthony Dick, and Anton Van Den Hengel. 2016. What Value Do Explicit High Level Concepts Have in Vision to Language Problems? In CVPR.
    [35]
    Kelvin Xu, Aaron Courville, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In ICML.
    [36]
    L. Xu, L. Jiang, C. Qin, Z. Wang, and D. Du. 2018. How Images Inspire Poems: Generating Classical Chinese Poetry from Images with Memory Networks, In AAAI. ArXiv e-prints. arxiv: cs.CL/1803.02994
    [37]
    Rui Yan. 2016. i, Poet: Automatic Poetry Composition through Recurrent Neural Networks with Iterative Polishing Schema. In IJCAI.
    [38]
    Rui Yan, Han Jiang, Mirella Lapata, Shou-De Lin, Xueqiang Lv, and Xiaoming Li. 2013. i, Poet: Automatic Chinese Poetry Composition through a Generative Summarization Framework under Constrained Optimization. In IJCAI.
    [39]
    Jin-ge Yao, Xiaojun Wan, and Jianguo Xiao. 2017. Recent advances in document summarization. Knowledge and Information Systems 53, 2 (2017), 297--336.
    [40]
    Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, and Tao Mei. 2017. Boosting Image Captioning with Attributes. In IEEE International Conference on Computer Vision (ICCV).
    [41]
    Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. Image Captioning with Semantic Attention. In CVPR.
    [42]
    Haonan Yu, Jiang Wang, Zhiheng Huang, Yi Yang, and Wei Xu. 2016. Video paragraph captioning using hierarchical recurrent neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4584--4593.
    [43]
    Licheng Yu, Mohit Bansal, and Tamara L. Berg. 2017. Hierarchically-Attentive RNN for Album Summarization and Storytelling. In EMNLP.
    [44]
    Jiyuan Zhang, Yang Feng, Dong Wang, Yang Wang, Andrew Abel, Shiyue Zhang, and Andi Zhang. 2017. Flexible and creative chinese poetry generation using neural memory. In ACL.
    [45]
    Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016. Video summarization with long short-term memory. In European conference on computer vision. Springer, 766--782.
    [46]
    Xingxing Zhang and Mirella Lapata. 2014. Chinese poetry generation with recurrent neural networks Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).
    [47]
    Cheng-Le Zhou, Wei You, and Xiaojun Ding. 2010. Genetic algorithm and its implementation of automatic generation of chinese songci. Journal of Software 21, 3 (2010).
    [48]
    Qingyu Zhou, Nan Yang, Furu Wei, and Ming Zhou. 2017. Selective encoding for abstractive sentence summarization, In ACL. arXiv preprint arXiv:1704.07073.

    Cited By

    View all
    • (2023)Poetry4painting: Diversified poetry generation for large-size ancient paintings based on data augmentationComputers & Graphics10.1016/j.cag.2023.07.029116(206-215)Online publication date: Nov-2023
    • (2022)Multi-Modal Experience Inspired AI CreationProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548189(1445-1454)Online publication date: 10-Oct-2022
    • (2022)DRIIS: Research on Automatic Recognition of Artistic Conception of Classical Poems Based on Deep LearningInternational Journal of Cooperative Information Systems10.1142/S021884302250001031:01n02Online publication date: 15-Oct-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '18: Proceedings of the 26th ACM international conference on Multimedia
    October 2018
    2167 pages
    ISBN:9781450356657
    DOI:10.1145/3240508
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adaptive self-attention mechanism
    2. image streams
    3. poetry generation
    4. selection mechanism

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '18
    Sponsor:
    MM '18: ACM Multimedia Conference
    October 22 - 26, 2018
    Seoul, Republic of Korea

    Acceptance Rates

    MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;
    Overall Acceptance Rate 995 of 4,171 submissions, 24%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)23
    • Downloads (Last 6 weeks)5

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Poetry4painting: Diversified poetry generation for large-size ancient paintings based on data augmentationComputers & Graphics10.1016/j.cag.2023.07.029116(206-215)Online publication date: Nov-2023
    • (2022)Multi-Modal Experience Inspired AI CreationProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548189(1445-1454)Online publication date: 10-Oct-2022
    • (2022)DRIIS: Research on Automatic Recognition of Artistic Conception of Classical Poems Based on Deep LearningInternational Journal of Cooperative Information Systems10.1142/S021884302250001031:01n02Online publication date: 15-Oct-2022
    • (2022)Images2Poem in different contexts with Dual‐CharRNNCAAI Transactions on Intelligence Technology10.1049/cit2.120897:4(685-694)Online publication date: 25-Mar-2022
    • (2021)iPoet: interactive painting poetry creation with visual multimodal analysisJournal of Visualization10.1007/s12650-021-00780-0Online publication date: 19-Nov-2021
    • (2019)Curiosity-driven Reinforcement Learning for Diverse Visual Paragraph GenerationProceedings of the 27th ACM International Conference on Multimedia10.1145/3343031.3350961(2341-2350)Online publication date: 15-Oct-2019
    • (2019)Rhyming Knowledge-Aware Deep Neural Network for Chinese Poetry Generation2019 International Conference on Machine Learning and Cybernetics (ICMLC)10.1109/ICMLC48188.2019.8949208(1-6)Online publication date: Jul-2019
    • (2019)Generating Diverse and Descriptive Image Captions Using Visual Paraphrases2019 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV.2019.00434(4239-4248)Online publication date: Oct-2019

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media