Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1526709.1526720acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Enhancing diversity, coverage and balance for summarization through structure learning

Published: 20 April 2009 Publication History

Abstract

Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to generate summaries from documents. However, these approaches seldom simultaneously consider summary diversity, coverage, and balance issues which to a large extent determine the quality of summaries. In this paper, we consider extract-based summarization emphasizing the following three requirements: 1) diversity in summarization, which seeks to reduce redundancy among sentences in the summary; 2) sufficient coverage, which focuses on avoiding the loss of the document's main information when generating the summary; and 3) balance, which demands that different aspects of the document need to have about the same relative importance in the summary. We formulate the extract-based summarization problem as learning a mapping from a set of sentences of a given document to a subset of the sentences that satisfies the above three requirements. The mapping is learned by incorporating several constraints in a structure learning framework, and we explore the graph structure of the output variables and employ structural SVM for solving the resulted optimization problem. Experiments on the DUC2001 data sets demonstrate significant performance improvements in terms of F1 and ROUGE metrics.

References

[1]
Ibm many aspects document summarization tool, http://www.alphaworks.ibm.com/tech/manyaspects.
[2]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107--117, Amsterdam, The Netherlands, The Netherlands, 1998. Elsevier Science Publishers B. V.
[3]
J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, New York, NY, USA, 1998. ACM.
[4]
C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, New York, NY, USA, 2008. ACM.
[5]
J. M. Conroy and D. P. O'leary. Text summarization via hidden markov models. In SIGIR, pages 406--407, New York, NY, USA, 2001. ACM.
[6]
M. desJardins, E. Eaton, and K. Wagsta. Learning user preferences for sets of objects. In ICML, pages 273--280, New York, NY, USA, 2006. ACM.
[7]
G. ErKan and D. R. Radev. Lexpagerank: Prestige in multi-document text summarization. In EMNLP, Barcelona, Spain, 2004.
[8]
J. Friedman, T. Hastie, and R. Tibshirani. The Elements of Statistical Learning: Data Mining, Inference and Prediction. newblock 2001.
[9]
J. Goldstein, V. Mittal, J. Carbonell, and M. Kantrowitz. Multi--document summarization by sentence extraction. In NAACL-ANLP, pages 40--48, Morristown, NJ, USA, 2000. Association for Computational Linguistics.
[10]
Y. H. Gong and X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In SIGIR, pages 19--25, New York, NY, USA, 2001. ACM.
[11]
S. Harabagiu and F. Lacatusu. Topic themes for multi-document summarization. In SIGIR, pages 202--209, New York, NY, USA, 2005. ACM.
[12]
H. Hardy, N. Shimizu, T. Strzalkowski, L. Ting, G. B. Wise, and X. Zhang. Cross--document summarization by concept classification. In SIGIR, pages 121--128, New York, NY, USA, 2002. ACM.
[13]
H. Jing and K. R. McKeown. Cut and paste based text summarization. In ANLP, pages 178--185, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
[14]
S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39--45, 1999.
[15]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999.
[16]
K. Knight and D. Marcu. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1):91--107, 2002.
[17]
J. Kupiec, J. Pedersen, and F. Chen. A trainable document summarizer. In SIGIR, pages 68--73, New York, NY, USA, 1995. ACM.
[18]
C. Y. Lin and E. Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics. In NAACL, pages 71--78, Morristown, NJ, USA, 2003. Association for Computational Linguistics.
[19]
D. Metzler and T. Kanungo. Machine learned sentence selection strategies for query-biased summarization. In SIGIR, 2008.
[20]
R. Mihalcea. Language independent extractive summarization. In AAAI, pages 1688--1689, 2005.
[21]
R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. In EMNLP, Barcelona, Spain, 2004.
[22]
M. Naaman and L. Kennedy. Generating diverse and representative image search results for landmarks. In WWW, pages 297--306, New York, NY, USA, 2008. ACM.
[23]
T. Nomoto and Y. Matsumoto. A new approach to unsupervised text summarization. In SIGIR, pages 26--34, New York, NY, USA, 2001. ACM.
[24]
C. V. Rijsbergen. Information Retrieval. 1979.
[25]
D. Shen, Z. Chen, Q. Yang, H. J. Zeng, B. Zhang, Y. Lu, and W. Y. Ma. Web-page classification through summarization. In SIGIR, pages 242--249, New York, NY, USA, 2004. ACM.
[26]
D. Shen, J. T. Sun, H. Li, Q. Yang, and Z. Chen. Document summarization using conditional random fields. In IJCAI, pages 2862--2867, 2007.
[27]
J. T. Sun, D. Shen, H. J. Zeng, Q. Yang, Y. C. Lu, and Z. Chen. Web-page summarization using clickthrough data. In SIGIR, pages 194--201, New York, NY, USA, 2005. ACM.
[28]
I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 6:1453--1484, 2005.
[29]
K. Wagsta, M. desJardins, E. Eaton, and J. Montminy. Learning and visualizing user preferences over sets. In AAAI, 2007.
[30]
X. Wan, J. Yang, and J. Xiao. Collabsum: Exploiting multiple document clustering for collaborative single document summarizations. In SIGIR, pages 143--150, New York, NY, USA, 2007. ACM.
[31]
Y. Yue and T. Joachims. Predicting diverse subsets using structural svms. In ICML, pages 1224--1231, New York, NY, USA, 2008. ACM.
[32]
H. Y. Zha. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In SIGIR, pages 113--120, New York, NY, USA, 2002. ACM.

Cited By

View all
  • (2022)Automatic Text Summarization of Biomedical Text Data: A Systematic ReviewInformation10.3390/info1308039313:8(393)Online publication date: 19-Aug-2022
  • (2021)Live blog summarizationLanguage Resources and Evaluation10.1007/s10579-020-09513-5Online publication date: 2-Jan-2021
  • (2021)Comparative Analysis of Models for Abstractive Text SummarizationInternational Conference on Innovative Computing and Communications10.1007/978-981-16-3071-2_45(553-563)Online publication date: 29-Aug-2021
  • Show More Cited By

Index Terms

  1. Enhancing diversity, coverage and balance for summarization through structure learning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '09: Proceedings of the 18th international conference on World wide web
      April 2009
      1280 pages
      ISBN:9781605584874
      DOI:10.1145/1526709

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 April 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. balance
      2. coverage
      3. diversity
      4. structural svm
      5. summarization

      Qualifiers

      • Research-article

      Conference

      WWW '09
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)21
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 03 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Automatic Text Summarization of Biomedical Text Data: A Systematic ReviewInformation10.3390/info1308039313:8(393)Online publication date: 19-Aug-2022
      • (2021)Live blog summarizationLanguage Resources and Evaluation10.1007/s10579-020-09513-5Online publication date: 2-Jan-2021
      • (2021)Comparative Analysis of Models for Abstractive Text SummarizationInternational Conference on Innovative Computing and Communications10.1007/978-981-16-3071-2_45(553-563)Online publication date: 29-Aug-2021
      • (2020)An unsupervised semantic sentence ranking scheme for text documentsIntegrated Computer-Aided Engineering10.3233/ICA-20062628:1(17-33)Online publication date: 21-Dec-2020
      • (2020)One-sided Differential Privacy2020 IEEE 36th International Conference on Data Engineering (ICDE)10.1109/ICDE48307.2020.00049(493-504)Online publication date: Apr-2020
      • (2020)Extractive Multi-Document Arabic Text Summarization Using Evolutionary Multi-Objective Optimization With K-Medoid ClusteringIEEE Access10.1109/ACCESS.2020.30464948(228206-228224)Online publication date: 2020
      • (2020)Grouping sentences as a better language unit for extractive text summarizationFuture Generation Computer Systems10.1016/j.future.2020.03.046Online publication date: Apr-2020
      • (2020)Opinion SummarizationOpinion Mining in Information Retrieval10.1007/978-981-15-5043-0_6(81-95)Online publication date: 20-May-2020
      • (2020)Chinese–Vietnamese Bilingual News Event Summarization Based on Distributed Graph RankingUrban Intelligence and Applications10.1007/978-3-030-45099-1_8(97-112)Online publication date: 26-Jun-2020
      • (2019)Telegenic Data Analysis for Compact Description and Genre Classification based on Re-ranking2019 International Conference on Communication Technologies (ComTech)10.1109/COMTECH.2019.8737797(102-107)Online publication date: Mar-2019
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media