Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3016100.3016309guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

TGSum: build tweet guided multi-document summarization dataset

Published: 12 February 2016 Publication History

Abstract

The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media's reactions. We utilize two types of social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to cluster documents into different topic sets. Also, a tweet with a hyper-link often highlights certain key points of the corresponding document. We synthesize a linked document cluster to form a reference summary which can cover most key points. To this aim, we adopt the ROUGE metrics to measure the coverage ratio, and develop an Integer Linear Programming solution to discover the sentence set reaching the upper bound of ROUGE. Since we allow summary sentences to be selected from both documents and high-quality tweets, the generated reference summaries could be abstractive. Both informativeness and readability of the collected summaries are verified by manual judgment. In addition, we train a Support Vector Regression summarizer on DUC generic multi-document summarization benchmarks. With the collected data as extra training resource, the performance of the summarizer improves a lot on all the test sets. We release this dataset for further research.

References

[1]
Berg-Kirkpatrick, T.; Gillick, D.; and Klein, D. 2011. Jointly learning to extract and compress. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, 481-490. Association for Computational Linguistics.
[2]
Bing, L.; Li, P.; Liao, Y.; Lam, W.; Guo, W.; and Passonneau, R. 2015. Abstractive multi-document summarization via phrase selection and merging. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1587-1597. Beijing, China: Association for Computational Linguistics.
[3]
Cao, Z.; Wei, F.; Dong, L.; Li, S.; and Zhou, M. 2015a. Ranking with recursive neural networks and its application to multi-document summarization. In Twenty-Ninth AAAI Conference on Artificial Intelligence.
[4]
Cao, Z.; Wei, F.; Li, S.; Li, W.; Zhou, M.; and WANG, H. 2015b. Learning summary prior representation for extractive summarization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 829-833. Beijing, China: Association for Computational Linguistics.
[5]
Carbonell, J., and Goldstein, J. 1998. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR, 335-336.
[6]
Chen, C.; Gao, D.; Li, W.; and Hou, Y. 2014. Inferring topic-dependent influence roles of twitter users. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, 1203-1206. ACM.
[7]
Filippova, K., and Altun, Y. 2013. Overcoming the lack of parallel data in sentence compression. In EMNLP, 1481-1491.
[8]
Gillick, D., and Favre, B. 2009. A scalable global model for summarization. In Proceedings of the Workshop on ILP for NLP, 10-18.
[9]
Hu, B.; Lu, Z.; Li, H.; and Chen, Q. 2014. Convolutional neural network architectures for matching natural language sentences. In Advances in Neural Information Processing Systems, 2042-2050.
[10]
Li, Y., and Li, S. 2014. Query-focused multi-document summarization: Combining a topic model with graph-based semi-supervised learning. In Proceedings of COLING, 1197-1207.
[11]
Li, S.; Ouyang, Y.; Wang, W.; and Sun, B. 2007. Multi-document summarization using support vector regression. In Proceedings of DUC.
[12]
Li, C.; Qian, X.; and Liu, Y. 2013. Using supervised bigram-based ilp for extractive summarization. In Proceedings of ACL, 1004-1013.
[13]
Lin, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In Proceedings of the ACL Workshop, 74-81.
[14]
Liu, F.; Liu, Y.; and Weng, F. 2011. Why is sxsw trending?: exploring multiple text sources for twitter topic summarization. In Proceedings of the Workshop on Languages in Social Media, 66-75. Association for Computational Linguistics.
[15]
Lloret, E., and Palomar, M. 2013. Towards automatic tweet generation: A comparative study from the text summarization perspective in the journalism genre. Expert Systems with Applications 40(16):6624-6630.
[16]
McDonald, R. 2007. A study of global inference algorithms in multi-document summarization. Springer.
[17]
Owczarzak, K.; Conroy, J. M.; Dang, H. T.; and Nenkova, A. 2012. An assessment of the accuracy of automatic evaluation in summarization. In Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization, 1-9.
[18]
Rioux, C.; Hasan, S. A.; and Chali, Y. 2014. Fear the reaper: A system for automatic multi-document summarization with reinforcement learning. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 681-690.
[19]
Schluter, N., and Søgaard, A. 2015. Unsupervised extractive summarization via coverage maximization with syntactic and semantic concepts. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 840-844. Beijing, China: Association for Computational Linguistics.
[20]
Sharifi, B.; Hutton, M.-A.; and Kalita, J. 2010a. Summarizing microblogs automatically. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 685-688. Association for Computational Linguistics.
[21]
Sharifi, B.; Hutton, M.-A.; and Kalita, J. K. 2010b. Experiments in microblog summarization. In Social Computing (SocialCom), 2010 IEEE Second International Conference on, 49-56. IEEE.
[22]
Wei, Z., and Gao, W. 2014. Utilizing microblogs for automatic news highlights extraction. COLING.
[23]
Woodsend, K., and Lapata, M. 2012. Multiple aspect summarization using integer linear programming. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 233-243. Association for Computational Linguistics.
[24]
Yang, Z.; Cai, K.; Tang, J.; Zhang, L.; Su, Z.; and Li, J. 2011. Social context summarization. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, 255-264. ACM.
[25]
Zhao, J.; Dong, L.; Wu, J.; and Xu, K. 2012. Moodlens: an emoticon-based sentiment analysis system for chinese tweets. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 1528-1531. ACM.

Cited By

View all
  • (2019)CQASUMMProceedings of the ACM India Joint International Conference on Data Science and Management of Data10.1145/3297001.3297004(18-26)Online publication date: 3-Jan-2019
  • (2018)Exploiting User Posts for Web Document SummarizationACM Transactions on Knowledge Discovery from Data10.1145/318656612:4(1-28)Online publication date: 8-Jun-2018
  • (2017)Recent advances in document summarizationKnowledge and Information Systems10.1007/s10115-017-1042-453:2(297-336)Online publication date: 1-Nov-2017

Index Terms

  1. TGSum: build tweet guided multi-document summarization dataset
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence
    February 2016
    4406 pages

    Sponsors

    • Association for the Advancement of Artificial Intelligence

    Publisher

    AAAI Press

    Publication History

    Published: 12 February 2016

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)CQASUMMProceedings of the ACM India Joint International Conference on Data Science and Management of Data10.1145/3297001.3297004(18-26)Online publication date: 3-Jan-2019
    • (2018)Exploiting User Posts for Web Document SummarizationACM Transactions on Knowledge Discovery from Data10.1145/318656612:4(1-28)Online publication date: 8-Jun-2018
    • (2017)Recent advances in document summarizationKnowledge and Information Systems10.1007/s10115-017-1042-453:2(297-336)Online publication date: 1-Nov-2017

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media