Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3366423.3380247acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Generating Representative Headlines for News Stories

Published: 20 April 2020 Publication History

Abstract

Millions of news articles are published online every day, which can be overwhelming for readers to follow. Grouping articles that are reporting the same event into news stories is a common way of assisting readers in their news consumption. However, it remains a challenging research problem to efficiently and effectively generate a representative headline for each story. Automatic summarization of a document set has been studied for decades, while few studies have focused on generating representative headlines for a set of articles. Unlike summaries, which aim to capture most information with least redundancy, headlines aim to capture information jointly shared by the story articles in short length and exclude information specific to each individual article.
In this work, we study the problem of generating representative headlines for news stories. We develop a distant supervision approach to train large-scale generation models without any human annotation. The proposed approach centers on two technical components. First, we propose a multi-level pre-training framework that incorporates massive unlabeled corpus with different quality-vs.-quantity balance at different levels. We show that models trained within the multi-level pre-training framework outperform those only trained with human-curated corpus. Second, we propose a novel self-voting-based article attention layer to extract salient information shared by multiple articles. We show that models that incorporate this attention layer are robust to potential noises in news stories and outperform existing baselines on both clean and noisy datasets. We further enhance our model by incorporating human labels, and show that our distant supervision approach significantly reduces the demand on labeled data. Finally, to serve the research community, we publish the first manually curated benchmark dataset on headline generation for news stories, NewSHead, which contains 367K stories (each with 3-5 articles), 6.5 times larger than the current largest multi-document summarization dataset.

References

[1]
Michele Banko, Vibhu O Mittal, and Michael J Witbrock. 2000. Headline generation based on statistical translation. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 318–325.
[2]
Regina Barzilay, Kathleen R McKeown, and Michael Elhadad. 1999. Information fusion in the context of multi-document summarization. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics. 550–557.
[3]
Tal Baumel, Matan Eyal, and Michael Elhadad. 2018. Query focused abstractive summarization: Incorporating query relevance, multi-document coverage, and summary length constraints into seq2seq models. arXiv preprint arXiv:1801.07704(2018).
[4]
Jaime G Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, Vol. 98. 335–336.
[5]
Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, and Hui Jiang. 2016. Distraction-based Neural Networks for Modeling Documents. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence(IJCAI’16). AAAI Press, 2754–2760.
[6]
Jianpeng Cheng and Mirella Lapata. 2016. Neural Summarization by Extracting Sentences and Words. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 484–494.
[7]
Sumit Chopra, Michael Auli, and Alexander M Rush. 2016. Abstractive sentence summarization with attentive recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 93–98.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.
[9]
Bonnie Dorr, David Zajic, and Richard Schwartz. 2003. Hedge trimmer: A parse-and-trim approach to headline generation. In Proceedings of the HLT-NAACL 03 on Text summarization workshop-Volume 5. Association for Computational Linguistics, 1–8.
[10]
Günes Erkan and Dragomir R Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research 22 (2004), 457–479.
[11]
Alexander Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir Radev. 2019. Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1074–1084. https://doi.org/10.18653/v1/P19-1102
[12]
Elena Filatova and Vasileios Hatzivassiloglou. 2004. Event-based extractive summarization. In Text Summarization Branches Out. 104–111.
[13]
Katja Filippova, Enrique Alfonseca, Carlos A Colmenares, Lukasz Kaiser, and Oriol Vinyals. 2015. Sentence compression by deletion with lstms. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 360–368.
[14]
Katja Filippova and Yasemin Altun. 2013. Overcoming the lack of parallel data in sentence compression. (2013).
[15]
Kavita Ganesan, ChengXiang Zhai, and Jiawei Han. 2010. Opinosis: A graph based approach to abstractive summarization of highly redundant opinions. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). 340–348.
[16]
Daniil Gavrilov, Pavel Kalaidin, and Valentin Malykh. 2019. Self-Attentive Model for Headline Generation. In European Conference on Information Retrieval. Springer, 87–93.
[17]
Jiatao Gu, Zhengdong Lu, Hang Li, and Victor OK Li. 2016. Incorporating Copying Mechanism in Sequence-to-Sequence Learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1631–1640.
[18]
Aria Haghighi and Lucy Vanderwende. 2009. Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 362–370.
[19]
Yuko Hayashi and Hidekazu Yanagimoto. 2018. Headline generation with recurrent neural network. In New Trends in E-service and Smart Computing. Springer, 81–96.
[20]
Yuta Kikuchi, Graham Neubig, Ryohei Sasano, Hiroya Takamura, and Manabu Okumura. 2016. Controlling Output Length in Neural Encoder-Decoders. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1328–1338.
[21]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).
[22]
Logan Lebanoff, Kaiqiang Song, and Fei Liu. 2018. Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4131–4141.
[23]
Piji Li, Wai Lam, Lidong Bing, and Zihao Wang. 2017. Deep Recurrent Generative Decoder for Abstractive Text Summarization. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2091–2100.
[24]
Wei Li, Xinyan Xiao, Yajuan Lyu, and Yuanzhuo Wang. 2018. Improving neural abstractive document summarization with explicit information selection modeling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 1787–1796.
[25]
Kexin Liao, Logan Lebanoff, and Fei Liu. 2018. Abstract Meaning Representation for Multi-Document Summarization. In Proceedings of the 27th International Conference on Computational Linguistics. 1178–1190.
[26]
Peter J Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. 2018. Generating Wikipedia by Summarizing Long Sequences. (2018).
[27]
Yang Liu and Mirella Lapata. 2019. Hierarchical Transformers for Multi-Document Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 5070–5081. https://doi.org/10.18653/v1/P19-1500
[28]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692(2019).
[29]
Hans Peter Luhn. 1958. The automatic creation of literature abstracts. IBM Journal of research and development 2, 2 (1958), 159–165.
[30]
Takuya Makino, Tomoya Iwakura, Hiroya Takamura, and Manabu Okumura. 2019. Global Optimization under Length Constraint for Neural Text Summarization. In Proceedings of the 57th Conference of the Association for Computational Linguistics. 1039–1048.
[31]
Yuning Mao, Liyuan Liu, Qi Zhu, Xiang Ren, and Jiawei Han. 2019. Facet-Aware Evaluation for Extractive Text Summarization. arXiv preprint arXiv:1908.10383(2019).
[32]
Kathleen McKeown and Dragomir R Radev. 1995. Generating summaries of multiple news articles. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 74–82.
[33]
Rada Mihalcea. 2005. Language independent extractive summarization. In ACL, Vol. 5. 49–52.
[34]
Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing. 404–411.
[35]
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Association for Computational Linguistics, 1003–1011.
[36]
Kazuma Murao, Ken Kobayashi, Hayato Kobayashi, Taichi Yatsuka, Takeshi Masuyama, Tatsuru Higurashi, and Yoshimune Tabuchi. 2019. A Case Study on Neural Headline Generation for Editing Support. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers). 73–82.
[37]
Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Thirty-First AAAI Conference on Artificial Intelligence.
[38]
Ramesh Nallapati, Bowen Zhou, Cicero dos Santos, Caglar Gulcehre, and Bing Xiang. 2016. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning. 280–290.
[39]
Shashi Narayan, Shay B Cohen, and Mirella Lapata. 2018. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. arXiv preprint arXiv:1808.08745(2018).
[40]
Shashi Narayan, Shay B Cohen, and Mirella Lapata. 2018. Ranking Sentences for Extractive Summarization with Reinforcement Learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1747–1759.
[41]
Shashi Narayan, Andreas Vlachos, 2019. HighRES: Highlight-based Reference-less Evaluation of Summarization. arXiv preprint arXiv:1906.01361(2019).
[42]
Karolina Owczarzak and Hoa Trang Dang. 2011. Overview of the TAC 2011 summarization track: Guided task and AESOP task. In Proceedings of the Text Analysis Conference (TAC 2011), Gaithersburg, Maryland, USA, November.
[43]
Ramakanth Pasunuru and Mohit Bansal. 2018. Multi-Reward Reinforced Summarization with Saliency and Entailment. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 646–653.
[44]
Over Paul and Yen James. 2004. An introduction to duc-2004. In Proceedings of the 4th Document Understanding Conference (DUC 2004).
[45]
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2227–2237.
[46]
Maxime Peyrard. 2019. A Simple Theoretical Model of Importance for Summarization. In Proceedings of the 57th Conference of the Association for Computational Linguistics. 1059–1073.
[47]
Dragomir R Radev, Hongyan Jing, Małgorzata Styś, and Daniel Tam. 2004. Centroid-based summarization of multiple documents. Information Processing & Management 40, 6 (2004), 919–938.
[48]
Dragomir R Radev and Kathleen R McKeown. 1998. Generating natural language summaries from multiple on-line sources. Computational Linguistics 24, 3 (1998), 470–500.
[49]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. [n.d.]. Improving Language Understanding by Generative Pre-Training. ([n. d.]).
[50]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019).
[51]
Xiang Ren, Ahmed El-Kishky, Chi Wang, Fangbo Tao, Clare R Voss, and Jiawei Han. 2015. Clustype: Effective entity recognition and typing by relation phrase-based clustering. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 995–1004.
[52]
Sascha Rothe, Shashi Narayan, and Aliaksei Severyn. 2019. Leveraging Pre-trained Checkpoints for Sequence Generation Tasks. arXiv preprint arXiv:1907.12461(2019).
[53]
Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A Neural Attention Model for Abstractive Sentence Summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 379–389. https://doi.org/10.18653/v1/D15-1044
[54]
Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1073–1083.
[55]
Shi-Qi Shen, Yan-Kai Lin, Cun-Chao Tu, Yu Zhao, Zhi-Yuan Liu, Mao-Song Sun, 2017. Recent advances on neural headline generation. Journal of computer science and technology 32, 4 (2017), 768–784.
[56]
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. MASS: Masked Sequence to Sequence Pre-training for Language Generation. In International Conference on Machine Learning. 5926–5936.
[57]
Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. ERNIE: Enhanced Representation through Knowledge Integration. arXiv preprint arXiv:1904.09223(2019).
[58]
Sho Takase, Jun Suzuki, Naoaki Okazaki, Tsutomu Hirao, and Masaaki Nagata. 2016. Neural headline generation on abstract meaning representation. In Proceedings of the 2016 conference on empirical methods in natural language processing. 1054–1059.
[59]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.
[60]
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer networks. In Advances in Neural Information Processing Systems. 2692–2700.
[61]
Kristian Woodsend and Mirella Lapata. 2010. Automatic generation of story highlights. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 565–574.
[62]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144(2016).
[63]
Stratos Xenouleas, Prodromos Malakasiotis, Marianna Apidianaki, and Ion Androutsopoulos. 2019. SUM-QE: a BERT-based Summary Quality Estimation Model. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 6004–6010. https://doi.org/10.18653/v1/D19-1618
[64]
Yongjian You, Weijia Jia, Tianyi Liu, and Wenmian Yang. 2019. Improving Abstractive Document Summarization with Salient Information Modeling. In Proceedings of the 57th Conference of the Association for Computational Linguistics. 2132–2141.
[65]
David Zajic, Bonnie Dorr, and Richard Schwartz. 2002. Automatic headline generation for newspaper stories. In Workshop on Automatic Summarization. 78–85.
[66]
Haoyu Zhang, Yeyun Gong, Yu Yan, Nan Duan, Jianjun Xu, Ji Wang, Ming Gong, and Ming Zhou. 2019. Pretraining-Based Natural Language Generation for Text Summarization. arXiv preprint arXiv:1902.09243(2019).
[67]
Jianmin Zhang, Jiwei Tan, and Xiaojun Wan. 2018. Adapting Neural Single-Document Summarization Model for Abstractive Multi-Document Summarization: A Pilot Study. In Proceedings of the 11th International Conference on Natural Language Generation. 381–390.
[68]
Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Jun Xu, Huanhuan Cao, and Xueqi Cheng. 2018. Question Headline Generation for News Articles. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 617–626.
[69]
Xingxing Zhang, Mirella Lapata, Furu Wei, and Ming Zhou. 2018. Neural Latent Extractive Document Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 779–784. https://doi.org/10.18653/v1/D18-1088
[70]
Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced Language Representation with Informative Entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1441–1451. https://doi.org/10.18653/v1/P19-1139

Cited By

View all
  • (2024)Feature-based Unsupervised Method for Salient Sentence Ranking in Text Summarization TaskProceedings of the 2024 9th International Conference on Intelligent Information Technology10.1145/3654522.3654556(346-351)Online publication date: 23-Feb-2024
  • (2024)Generating Headlines from Article Summaries Using Transformer Models2024 International Conference on Expert Clouds and Applications (ICOECA)10.1109/ICOECA62351.2024.00040(154-162)Online publication date: 18-Apr-2024
  • (2023)Put Your Voice on Stage: Personalized Headline Generation for News ArticlesACM Transactions on Knowledge Discovery from Data10.1145/362916818:3(1-20)Online publication date: 9-Dec-2023
  • Show More Cited By

Index Terms

  1. Generating Representative Headlines for News Stories
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '20: Proceedings of The Web Conference 2020
        April 2020
        3143 pages
        ISBN:9781450370233
        DOI:10.1145/3366423
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 20 April 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        WWW '20
        Sponsor:
        WWW '20: The Web Conference 2020
        April 20 - 24, 2020
        Taipei, Taiwan

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)89
        • Downloads (Last 6 weeks)10
        Reflects downloads up to 09 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Feature-based Unsupervised Method for Salient Sentence Ranking in Text Summarization TaskProceedings of the 2024 9th International Conference on Intelligent Information Technology10.1145/3654522.3654556(346-351)Online publication date: 23-Feb-2024
        • (2024)Generating Headlines from Article Summaries Using Transformer Models2024 International Conference on Expert Clouds and Applications (ICOECA)10.1109/ICOECA62351.2024.00040(154-162)Online publication date: 18-Apr-2024
        • (2023)Put Your Voice on Stage: Personalized Headline Generation for News ArticlesACM Transactions on Knowledge Discovery from Data10.1145/362916818:3(1-20)Online publication date: 9-Dec-2023
        • (2023)SCStory: Self-supervised and Continual Online Story DiscoveryProceedings of the ACM Web Conference 202310.1145/3543507.3583507(1853-1864)Online publication date: 30-Apr-2023
        • (2023)“Why is this misleading?”: Detecting News Headline Hallucinations with ExplanationsProceedings of the ACM Web Conference 202310.1145/3543507.3583375(1662-1672)Online publication date: 30-Apr-2023
        • (2023)PDSum: Prototype-driven Continuous Summarization of Evolving Multi-document Sets StreamProceedings of the ACM Web Conference 202310.1145/3543507.3583371(1650-1661)Online publication date: 30-Apr-2023
        • (2023)Fact-Preserved Personalized News Headline Generation2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00197(1493-1498)Online publication date: 1-Dec-2023
        • (2023)Identifying chronological and coherent information threads using 5W1H questions and temporal relationshipsInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10327460:3Online publication date: 24-May-2023
        • (2023)Effective Hierarchical Information Threading Using Network Community DetectionAdvances in Information Retrieval10.1007/978-3-031-28244-7_44(701-716)Online publication date: 17-Mar-2023
        • (2022)News headline generation based on improved decoder from transformerScientific Reports10.1038/s41598-022-15817-z12:1Online publication date: 8-Jul-2022
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media