Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3340531.3411909acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Soap: Soaking Capacity Optimization for Multi-Document Summarization

Published: 19 October 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Multi-document summarization (MDS) aims at giving a brief summary for a cluster of related documents. In this paper, we consider the MDS task as an optimization problem with a novel measure named soaking capacity being the objective function. The origin of our method is the classic hypothesis: the summary components are the sinks of information diffusion. We point out that the hypothesis only gives the role of summary but does not cover how well a summary acts as this role. To fill in the gap, soaking capacity is formally defined to quantify the ability of summary to soak up information. We explicitly demonstrate its fitness as an indicator for both the saliency and the diversity goal of MDS. For solving the optimization problem, we propose a greedy algorithm named Soap by adopting a surrogate of soaking capacity to accelerate the computation. Experiments on MDS datasets across various domains show the great potential of Soap as compared with the state-of-the-art MDS systems.

    Supplementary Material

    MP4 File (3340531.3411909.mp4)
    This video is a presentation of the associated paper Soap: Soaking Capacity Optimization for Multi-Document Summarization" (full research paper at CIKM '20). Specifically, the video presents the position, principle, theoretical analysis and empirical performance of the proposed method. More details can be found in the slides and original paper.

    References

    [1]
    Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, Vol. 35, 8 (2013), 1798--1828.
    [2]
    Xue-Qi Cheng, Pan Du, Jiafeng Guo, Xiaofei Zhu, and Yixin Chen. 2013. Ranking on Data Manifold with Sink Points. IEEE Trans. on Knowl. and Data Eng., Vol. 25, 1 (Jan. 2013), 177--191.
    [3]
    Eric Chu and Peter Liu. 2019. MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, Long Beach, California, USA, 1223--1232.
    [4]
    John M Conroy, Judith D Schlesinger, Jade Goldstein, and Dianne P O'leary. 2004. Left-brain/right-brain multi-document summarization. In Proceedings of the Document Understanding Conference (DUC 2004).
    [5]
    Hoa Trang Dang. 2005. Overview of DUC 2005. In Proceedings of the document understanding conference, Vol. 2005. 1--12.
    [6]
    John N Darroch and Eugene Seneta. 1965. On quasi-stationary distributions in absorbing discrete-time finite Markov chains. Journal of Applied Probability, Vol. 2, 1 (1965), 88--100.
    [7]
    Günes Erkan and Dragomir R. Radev. 2004. LexRank: Graph-Based Lexical Centrality as Salience in Text Summarization. J. Artif. Int. Res., Vol. 22, 1 (Dec. 2004), 457--479.
    [8]
    Dan Gillick and Benoit Favre. 2009. A Scalable Global Model for Summarization. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing (Boulder, Colorado) (ILP '09). Association for Computational Linguistics, USA, 10--18.
    [9]
    Geoff Hart. 2002. The five w's of online help systems.
    [10]
    Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching Machines to Read and Comprehend. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 1693--1701.
    [11]
    Kai Hong, John Conroy, Benoit Favre, Alex Kulesza, Hui Lin, and Ani Nenkova. 2014. A Repository of State of the Art and Competitive Baseline Summaries for Generic News Summarization. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). European Language Resources Association (ELRA), Reykjavik, Iceland, 1608--1616.
    [12]
    Logan Lebanoff, Kaiqiang Song, and Fei Liu. 2018. Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 4131--4141.
    [13]
    Huiying Li, Yue Hu, Zeyuan Li, Xiaojun Wan, and Jianguo Xiao. 2011. Pkutm participation in tac2011. Proceeding RTE, Vol. 7 (2011).
    [14]
    Piji Li, Lidong Bing, Wai Lam, Hang Li, and Yi Liao. 2015. Reader-aware multi-document summarization via sparse coding. In Twenty-Fourth International Joint Conference on Artificial Intelligence.
    [15]
    Piji Li, Wai Lam, Lidong Bing, Weiwei Guo, and Hang Li. 2017a. Cascaded attention based unsupervised information distillation for compressive summarization. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2081--2090.
    [16]
    Piji Li, Zihao Wang, Wai Lam, Zhaochun Ren, and Lidong Bing. 2017b. Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization.
    [17]
    Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74--81.
    [18]
    Hui Lin and Jeff Bilmes. 2011. A Class of Submodular Functions for Document Summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 (Portland, Oregon) (HLT '11). Association for Computational Linguistics, USA, 510--520.
    [19]
    Hui Lin and Jeff Bilmes. 2012. Learning Mixtures of Submodular Shells with Application to Document Summarization. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (Catalina Island, CA) (UAI'12). AUAI Press, Arlington, Virginia, USA, 479--490.
    [20]
    He Liu, Hongliang Yu, and Zhi-Hong Deng. 2015. Multi-Document Summarization Based on Two-Level Sparse Representation Model.
    [21]
    Qiaozhu Mei, Jian Guo, and Dragomir Radev. 2010. DivRank: The Interplay of Prestige and Diversity in Information Networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Washington, DC, USA) (KDD '10). Association for Computing Machinery, New York, NY, USA, 1009--1018.
    [22]
    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
    [23]
    Mir Tafseer Nayeem, Tanvir Ahmed Fuad, and Yllias Chali. 2018. Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, 1191--1204.
    [24]
    Jun Ping Ng, Praveen Bysani, Ziheng Lin, Min-Yen Kan, and Chew Lim Tan. 2012. Exploiting category-specific information for multi-document summarization. In Proceedings of COLING 2012. 2093--2108.
    [25]
    Karolina Owczarzak, John M Conroy, Hoa Trang Dang, and Ani Nenkova. 2012. An assessment of the accuracy of automatic evaluation in summarization. In Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization. Association for Computational Linguistics, 1--9.
    [26]
    Dragomir Radev, Hongyan Jing, and Malgorzata Budzikowska. 2000. Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In NAACL-ANLP 2000 Workshop: Automatic Summarization.
    [27]
    Radim v Rehr uv rek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta, 45--50. %urlhttp://is.muni.cz/publication/884893/en.
    [28]
    Gaetano Rossiello, Pierpaolo Basile, and Giovanni Semeraro. 2017. Centroid-based text summarization through compositionality of word embeddings. In Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres. 12--21.
    [29]
    Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1073--1083.
    [30]
    Gilbert W Stewart. 1990. Matrix perturbation theory. (1990).
    [31]
    Xiwei Tang, Jianxin Wang, Jiancheng Zhong, and Yi Pan. 2014. Predicting Essential Proteins Based on Weighted Degree Centrality. IEEE/ACM Trans. Comput. Biol. Bioinformatics, Vol. 11, 2 (March 2014), 407--418.
    [32]
    Xiaojun Wan and Jianguo Xiao. 2009. Graph-based multi-modality learning for topic-focused multi-document summarization. In Twenty-First International Joint Conference on Artificial Intelligence.
    [33]
    Xiaojun Wan, Jianwu Yang, and Jianguo Xiao. 2007. Manifold-Ranking Based Topic-Focused Multi-Document Summarization. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (Hyderabad, India) (IJCAI'07). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2903--2908.
    [34]
    Kexiang Wang, Tianyu Liu, Zhifang Sui, and Baobao Chang. 2017. Affinity-Preserving Random Walk for Multi-Document Summarization. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 210--220.
    [35]
    Kang Yang, Kamal Al-Sabahi, Yanmin Xiang, and Zuping Zhang. 2018. An Integrated Graph Model for Document Summarization. Information, Vol. 9, 9 (2018), 232.
    [36]
    Michihiro Yasunaga, Rui Zhang, Kshitijh Meelu, Ayush Pareek, Krishnan Srinivasan, and Dragomir Radev. 2017. Graph-based Neural Multi-Document Summarization. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Association for Computational Linguistics, Vancouver, Canada, 452--462.
    [37]
    Wenpeng Yin and Yulong Pei. 2015. Optimizing Sentence Modeling and Selection for Document Summarization. In Proceedings of the 24th International Conference on Artificial Intelligence (Buenos Aires, Argentina) (IJCAI'15). AAAI Press, 1383--1389.
    [38]
    Xin Zheng, Aixin Sun, Jing Li, and Karthik Muthuswamy. 2019. Subtopic-driven Multi-Document Summarization. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3153--3162.
    [39]
    Xiaojin Zhu, Andrew Goldberg, Jurgen Van Gael, and David Andrzejewski. 2007. Improving Diversity in Ranking using Absorbing Random Walks. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference. Association for Computational Linguistics, Rochester, New York, 97--104.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
    October 2020
    3619 pages
    ISBN:9781450368599
    DOI:10.1145/3340531
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. multi-document summarization
    2. soaking capacity optimization

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CIKM '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 149
      Total Downloads
    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media