Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Towards content-level coherence with aspect-guided summarization

Published: 22 March 2013 Publication History
  • Get Citation Alerts
  • Abstract

    The TAC 2010 summarization track initiated a new task—aspect-guided summarization—that centers on textual aspects embodied as particular kinds of information of a text. We observe that aspect-guided summaries not only address highly specific user need, but also facilitate content-level coherence by using aspect information. In this article, we present a full-fledged approach to aspect-guided summarization with a focus on summary coherence. Our summarization approach depends on two prerequisite subtasks: recognizing aspect-bearing sentences in order to do sentence extraction, and modeling aspect-based coherence with an HMM model in order to predict a coherent sentence ordering. Using the manually annotated TAC 2010 and 2010 datasets, we validated the effectiveness of our proposed methods for those subtasks. Drawing on the empirical results, we proceed to develop an aspect-guided summarizer based on a simple but robust base summarizer. With sentence selection guided by aspect information, our system is one of the best on TAC 2011. With sentence ordering predicted by the aspect-based HMM model, the summaries achieve good coherence.

    References

    [1]
    Barzilay, R. and Lapata, M. 2008. Modeling local coherence: An entity-based approach. Comput. Linguist. 34, 1--34.
    [2]
    Barzilay, R., and Lee, L. 2004. Catching the drift: Probabilistic content models, with applications to generation and summarization. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL). 113--120.
    [3]
    Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 4--5, 993--1022.
    [4]
    Boutell, M. R., Luo, J., Shen, X., and Brown, C. M. 2004. Learning multi-label scene classification. Pattern Recogn. 37, 9, 1757--71.
    [5]
    Daumé III, H. and Marcu, D. 2006. Bayesian query-focused summarization. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 305--312.
    [6]
    Elsner, M., Austerweil, J. and Charniak, E. 2007. A unified local and global model for discourse coherence. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL). 436--443.
    [7]
    Fuentes, M., Alfonseca, E., and Rodríguez, H. 2007. Support vector machines for query-focused summarization trained and evaluated on pyramid data. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (Companion Volume Proceedings of the Demo and Poster Sessions). 57--60.
    [8]
    Genest, P. and Lapalme, G. 2010. Text generation for abstractive summarization. In Proceedings of the 3rd Text Analysis Conference. National Institute of Standards and Technology.
    [9]
    Ji, H., Favre, B., Lin, W., Gillick, D., Hakkani-Tur, D., and Grishman, R. 2011. Open-Domain multi-document summarization via information extraction: Challenges and prospects. In Multi-Source, Multilingual Information Extraction and Summarisation Volume of Theory and Applications of Natural Language Processing. Springer.
    [10]
    Joachims, T. 1999. Transductive inference for text classification using support vector machines. In Proceedings of the 16th International Conference on Machine Learning (ICML'99).
    [11]
    Klein, D., and Manning, C. D. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Meeting of the Association for Computational Linguistics. 423--430.
    [12]
    Lapata, M. 2003. Probabilistic text structuring: Experiments with sentence ordering. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL). 545--552.
    [13]
    Lapata, M. 2006. Automatic evaluation of information ordering: Kendall's tau. Comput. Linguist. 32, 4, 1--14.
    [14]
    Li, P., Wang, Y., Gao, W., and Jiang, J. 2011. Generating aspect-oriented multi-document summarization with event-aspect model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1137--1146.
    [15]
    Li, W., Li, W., and Lu, Q. 2006. Mining implicit entities in queries. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'06). 24--26.
    [16]
    Lin, C.-Y. and Hovy, E. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the Human Technology Conference (HLT-NAACL'03). 71--78.
    [17]
    Mani, I. and Bloedorn E. 1999. Summarizing similarities and differences among related documents. Inf. Retr. 1, 35--67.
    [18]
    Mcknight, L. and Srinivasan, P. 2003. Categorization of sentence types in medical abstracts. In Proceedings of the American Medical Informatics Association Annual Symposium. 440--444.
    [19]
    Owczarzak, K. and Dang, H. T. 2011. Who wrote what where: Analyzing the content of human and automatic summaries. In Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages. 25--32.
    [20]
    Patwardhan, S. 2010. Widening the field of view of information extraction through sentential event recognition. Ph.D. dissertation, The University of Utah.
    [21]
    Rand, W. M. 1971. Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Assoc. 66, 336, 846--850.
    [22]
    Riloff, E. 1996. Automatically generating extraction patterns from untagged text. In Proceedings of the 13th National Conference on Artificial Intelligence. 1044--1049.
    [23]
    Schilder, F. and Kondadadi, R. 2008. FastSum: Fast and accurate query-based multi-document summarization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL'08: HLT). 205--208.
    [24]
    Stevenson, M. and Greenwood, M. A. 2005. A semantic approach to ie pattern recognition. In Proceedings of the 43rd Annual Meeting of the Association of Computational Linguistics (ACL). 379--386.
    [25]
    Teufel, S. and Moens, M. 1999. Argumentative classification of extracted sentences as a first step towards flexible abstracting. In Advances in Automatic Text Summarization, I. Mani and M. T. Maybury, Eds., MIT Press, Cambridge, MA, 155--171.
    [26]
    Teufel, S. and Moens, M. 2002. Summarizing scientific articles: Experiments with relevance and rhetorical status. Comput. Linguist. 28, 4, 409--445.
    [27]
    Tsoumakas, G. and Katakis, I. 2007. Multi label classification: An overview. Int. J. Data Warehous. Min. 3, 3, 1--13.
    [28]
    Vanderwende, L., Suzuki, H., Brockett, C., and Nenkova, A. 2007. Beyond sumbasic: Task-Focused summarization with sentence simplification and lexical expansion. Inf. Process. Manag. 43, 6, 1606--1618.
    [29]
    Vapnik, V. 1998. Statistical Learning Theory. John Wiley & Sons, New York.
    [30]
    Wan, X., Yang, J., and Xiao, J. 2007. Towards a unified approach based on affinity graph to various multi-document summarizations. In Proceedings of the 11th European Conference. 297--308.
    [31]
    Wang, L., Shen, X., and Pan, W. 2007. On transductive support vector machines. In Prediction and Discovery, J. Verducci, X. Shen, and J. Lafferty, Eds., American Mathematical Society.
    [32]
    Yangarber, R. 2003. Counter-Training in the discovery of semantic patterns. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL'03). 343--350.
    [33]
    Zhang, R., Ouyang, Y., and Li, W. 2011. Guided summarization with aspect recognition. In Proceedings of Textual Analysis Conference (TAC'11).
    [34]
    Zhou, L., Ticrea, M., and Hovy, E. 2004. Multidocument biography summarization. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP'04). 434--441.

    Cited By

    View all
    • (2024)A topic modeling‐based bibliometric exploration of automatic summarization researchWIREs Data Mining and Knowledge Discovery10.1002/widm.1540Online publication date: 25-Apr-2024
    • (2020)Deep Learning Approaches to Text ProductionSynthesis Lectures on Human Language Technologies10.2200/S00979ED1V01Y201912HLT04413:1(1-199)Online publication date: 20-Mar-2020
    • (2017)Recent advances in document summarizationKnowledge and Information Systems10.1007/s10115-017-1042-453:2(297-336)Online publication date: 1-Nov-2017

    Index Terms

    1. Towards content-level coherence with aspect-guided summarization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Speech and Language Processing
      ACM Transactions on Speech and Language Processing   Volume 10, Issue 1
      March 2013
      50 pages
      ISSN:1550-4875
      EISSN:1550-4883
      DOI:10.1145/2442076
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 March 2013
      Accepted: 01 January 2013
      Revised: 01 January 2013
      Received: 01 February 2012
      Published in TSLP Volume 10, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Summarization
      2. coherence
      3. content model
      4. textual aspect

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A topic modeling‐based bibliometric exploration of automatic summarization researchWIREs Data Mining and Knowledge Discovery10.1002/widm.1540Online publication date: 25-Apr-2024
      • (2020)Deep Learning Approaches to Text ProductionSynthesis Lectures on Human Language Technologies10.2200/S00979ED1V01Y201912HLT04413:1(1-199)Online publication date: 20-Mar-2020
      • (2017)Recent advances in document summarizationKnowledge and Information Systems10.1007/s10115-017-1042-453:2(297-336)Online publication date: 1-Nov-2017

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media