Article

TGSum: build tweet guided multi-document summarization dataset

Authors:

Ming ZhouAuthors Info & Claims

AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

Pages 2906 - 2912

Published: 12 February 2016 Publication History

Abstract

The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media's reactions. We utilize two types of social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to cluster documents into different topic sets. Also, a tweet with a hyper-link often highlights certain key points of the corresponding document. We synthesize a linked document cluster to form a reference summary which can cover most key points. To this aim, we adopt the ROUGE metrics to measure the coverage ratio, and develop an Integer Linear Programming solution to discover the sentence set reaching the upper bound of ROUGE. Since we allow summary sentences to be selected from both documents and high-quality tweets, the generated reference summaries could be abstractive. Both informativeness and readability of the collected summaries are verified by manual judgment. In addition, we train a Support Vector Regression summarizer on DUC generic multi-document summarization benchmarks. With the collected data as extra training resource, the performance of the summarizer improves a lot on all the test sets. We release this dataset for further research.

References

[1]

Berg-Kirkpatrick, T.; Gillick, D.; and Klein, D. 2011. Jointly learning to extract and compress. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, 481-490. Association for Computational Linguistics.

Digital Library

[2]

Bing, L.; Li, P.; Liao, Y.; Lam, W.; Guo, W.; and Passonneau, R. 2015. Abstractive multi-document summarization via phrase selection and merging. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1587-1597. Beijing, China: Association for Computational Linguistics.

[3]

Cao, Z.; Wei, F.; Dong, L.; Li, S.; and Zhou, M. 2015a. Ranking with recursive neural networks and its application to multi-document summarization. In Twenty-Ninth AAAI Conference on Artificial Intelligence.

Digital Library

[4]

Cao, Z.; Wei, F.; Li, S.; Li, W.; Zhou, M.; and WANG, H. 2015b. Learning summary prior representation for extractive summarization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 829-833. Beijing, China: Association for Computational Linguistics.

[5]

Carbonell, J., and Goldstein, J. 1998. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR, 335-336.

Digital Library

[6]

Chen, C.; Gao, D.; Li, W.; and Hou, Y. 2014. Inferring topic-dependent influence roles of twitter users. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, 1203-1206. ACM.

Digital Library

[7]

Filippova, K., and Altun, Y. 2013. Overcoming the lack of parallel data in sentence compression. In EMNLP, 1481-1491.

[8]

Gillick, D., and Favre, B. 2009. A scalable global model for summarization. In Proceedings of the Workshop on ILP for NLP, 10-18.

Digital Library

[9]

Hu, B.; Lu, Z.; Li, H.; and Chen, Q. 2014. Convolutional neural network architectures for matching natural language sentences. In Advances in Neural Information Processing Systems, 2042-2050.

Digital Library

[10]

Li, Y., and Li, S. 2014. Query-focused multi-document summarization: Combining a topic model with graph-based semi-supervised learning. In Proceedings of COLING, 1197-1207.

[11]

Li, S.; Ouyang, Y.; Wang, W.; and Sun, B. 2007. Multi-document summarization using support vector regression. In Proceedings of DUC.

[12]

Li, C.; Qian, X.; and Liu, Y. 2013. Using supervised bigram-based ilp for extractive summarization. In Proceedings of ACL, 1004-1013.

[13]

Lin, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In Proceedings of the ACL Workshop, 74-81.

[14]

Liu, F.; Liu, Y.; and Weng, F. 2011. Why is sxsw trending?: exploring multiple text sources for twitter topic summarization. In Proceedings of the Workshop on Languages in Social Media, 66-75. Association for Computational Linguistics.

Digital Library

[15]

Lloret, E., and Palomar, M. 2013. Towards automatic tweet generation: A comparative study from the text summarization perspective in the journalism genre. Expert Systems with Applications 40(16):6624-6630.

Digital Library

[16]

McDonald, R. 2007. A study of global inference algorithms in multi-document summarization. Springer.

[17]

Owczarzak, K.; Conroy, J. M.; Dang, H. T.; and Nenkova, A. 2012. An assessment of the accuracy of automatic evaluation in summarization. In Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization, 1-9.

Digital Library

[18]

Rioux, C.; Hasan, S. A.; and Chali, Y. 2014. Fear the reaper: A system for automatic multi-document summarization with reinforcement learning. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 681-690.

[19]

Schluter, N., and Søgaard, A. 2015. Unsupervised extractive summarization via coverage maximization with syntactic and semantic concepts. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 840-844. Beijing, China: Association for Computational Linguistics.

[20]

Sharifi, B.; Hutton, M.-A.; and Kalita, J. 2010a. Summarizing microblogs automatically. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 685-688. Association for Computational Linguistics.

Digital Library

[21]

Sharifi, B.; Hutton, M.-A.; and Kalita, J. K. 2010b. Experiments in microblog summarization. In Social Computing (SocialCom), 2010 IEEE Second International Conference on, 49-56. IEEE.

Digital Library

[22]

Wei, Z., and Gao, W. 2014. Utilizing microblogs for automatic news highlights extraction. COLING.

[23]

Woodsend, K., and Lapata, M. 2012. Multiple aspect summarization using integer linear programming. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 233-243. Association for Computational Linguistics.

Digital Library

[24]

Yang, Z.; Cai, K.; Tang, J.; Zhang, L.; Su, Z.; and Li, J. 2011. Social context summarization. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, 255-264. ACM.

Digital Library

[25]

Zhao, J.; Dong, L.; Wu, J.; and Xu, K. 2012. Moodlens: an emoticon-based sentiment analysis system for chinese tweets. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 1528-1531. ACM.

Digital Library

Cited By

Chowdhury TChakraborty TDey LChaudhury SKrishnapuram RSingla PRoy R(2019)CQASUMMProceedings of the ACM India Joint International Conference on Data Science and Management of Data10.1145/3297001.3297004(18-26)Online publication date: 3-Jan-2019
https://dl.acm.org/doi/10.1145/3297001.3297004
Nguyen MTran DNguyen LPhan X(2018)Exploiting User Posts for Web Document SummarizationACM Transactions on Knowledge Discovery from Data10.1145/318656612:4(1-28)Online publication date: 8-Jun-2018
https://dl.acm.org/doi/10.1145/3186566
Yao JWan XXiao J(2017)Recent advances in document summarizationKnowledge and Information Systems10.1007/s10115-017-1042-453:2(297-336)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1007/s10115-017-1042-4

Index Terms

TGSum: build tweet guided multi-document summarization dataset
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Index terms have been assigned to the content through auto-classification.

Recommendations

Topic-driven reader comments summarization
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Readers of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking ...
Detecting bursts in sentiment-aware topics from social media

Nowadays plenty of user-generated posts, e.g., sina weibos, are published on the social media. The posts contain the publics sentiments (i.e., positive or negative) towards various topics. Bursty sentiment-aware topics from these posts reveal sentiment-...
Social sentiment sensor: a visualization system for topic detection and topic sentiment analysis on microblog

As a new form of social media, microblogging provides platform sharing, wherein users can share their feelings and ideas on certain topics. Bursty topics from microblogs are the results of the emerging issues that instantly attract more followers and ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

February 2016

4406 pages

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 12 February 2016

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chowdhury TChakraborty TDey LChaudhury SKrishnapuram RSingla PRoy R(2019)CQASUMMProceedings of the ACM India Joint International Conference on Data Science and Management of Data10.1145/3297001.3297004(18-26)Online publication date: 3-Jan-2019
https://dl.acm.org/doi/10.1145/3297001.3297004
Nguyen MTran DNguyen LPhan X(2018)Exploiting User Posts for Web Document SummarizationACM Transactions on Knowledge Discovery from Data10.1145/318656612:4(1-28)Online publication date: 8-Jun-2018
https://dl.acm.org/doi/10.1145/3186566
Yao JWan XXiao J(2017)Recent advances in document summarizationKnowledge and Information Systems10.1007/s10115-017-1042-453:2(297-336)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1007/s10115-017-1042-4

View Options

View options

Media

Figures

Other

Tables

View Table of Contents