Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Comparative Document Summarization via Discriminative Sentence Selection

Published: 01 March 2013 Publication History

Abstract

Given a collection of document groups, a natural question is to identify the differences among these groups. Although traditional document summarization techniques can summarize the content of the document groups one by one, there exists a great necessity to generate a summary of the differences among the document groups. In this article, we study a novel problem of summarizing the differences between document groups. A discriminative sentence selection method is proposed to extract the most discriminative sentences that represent the specific characteristics of each document group. Experiments and case studies on real-world data sets demonstrate the effectiveness of our proposed method.

References

[1]
Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y. 1998. Topic detection and tracking pilot study final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. 194--218.
[2]
Allan, J., Gupta, R., and Khandelwal, V. 2001. Temporal summaries of new topics. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01), 10--18.
[3]
Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. ACM Press.
[4]
Barzilay, R., McKeown, K., and Elhadad, M. 1999. Information fusion in the context of multi-document summarization. In Proceedings of ACL.
[5]
Baxendale, P. B. 1958. Machine-made index for technical literature: An experiment. IBM J. Res. Dev. 2, 354--361.
[6]
Brants, T., Chen, F., and Farahat, A. 2003. A system for new event detection. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’03). ACM, New York, 330--337.
[7]
Carbonell, J. and Goldstein, J. 1998. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98). 335--336.
[8]
Chi, Y., Zhu, S., Song, X., Tatemura, J., and Tseng, B. L. 2007. Structural and temporal analysis of the blogosphere through community factorization. In Proceedings of SIGKDD.
[9]
Conroy, J. M. and O’leary, D. P. 2001. Text summarization via hidden markov models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01), 406--407.
[10]
Ding, C., He, X., and Simon, H. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of SIAM Data Mining.
[11]
DUC. 2006. http://www-nlpir.nist.gov/projects/duc/pubs/.
[12]
Edmundson, H. P. 1969. New methods in automatic extracting. J. ACM 16, 264--285.
[13]
Erkan, S. and Radev, D. R. 2004. Lexpagerank: Prestige in multi-document text summarization. In Proceedings of EMNLP.
[14]
Fung, G. P. C., Yu, J. X., Liu, H., and Yu, P. S. 2007. Time-dependent event hierarchy construction. In Proceedings of KDD’07. ACM, New York, 300--309.
[15]
Goldstein, J., Kantrowitz, M., Mittal, V., and Carbonell, J. 1999. Summarizing text documents: Sentence selection and evaluation metrics. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 121--128.
[16]
Gong, Y. and Liu, X. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01).
[17]
Hu, M. and Liu, B. 2004. Mining and summarizing customer reviews. In Proceedings of SIGKDD.
[18]
Jing, H. and McKeown, K. 2000. Cut and paste based text summarization. In Proceedings of NAACL’00.
[19]
Knight, K. and Marcu, D. 2002. Summarization beyond sentence extraction: A probablistic approach to sentence compression. Artif. Intell. 139, 1, 91--107.
[20]
Kumaran, G. and Allan, J. 2004. Text classification and named entities for new event detection. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’04). ACM, New York, 297--304.
[21]
Lerman, K. and McDonald, R. 2009. Contrastive summarization: An experiment with consumer reviews. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-Short’09), 113--116.
[22]
Li, T. and Ding, C. 2006. The relationships among various nonnegative matrix factorization methods for clustering. In Proceedings of the IEEE International Conference on Data Mining (ICDM). 362--371.
[23]
Li, T. and Ding, C. 2008. Weighted consensus clustering. In Proceedings of the SIAM International Conference on Data Mining (SDM).
[24]
Li, X. and Croft, W. B. 2006. Improving novelty detection for general topics using sentence level information patterns. In Proceedings of CIKM’06. ACM, New York, 238--247.
[25]
Lin, C.-Y. and Hovy, E. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of NLT-NAACL.
[26]
Makkonen, J., Ahonen-Myka, H., and Salmenkivi, M. 2004. Simple semantics in topic detection and tracking. Inf. Retr. 7, 347--368.
[27]
Mani, I. 2001. Automatic Summarization. John Benjamins Publishing Company.
[28]
Mani, I. and Bloedorn, E. 1997. Multi-document summarization by graph search and matching. In Proceedings of AAAI/IAAI. 622--628.
[29]
Mani, I. and Bloedorn, E. 1999. Summarizing similarities and differences among related documents. Inf. Retri. 1, 35--67.
[30]
McCallum, A., Nigam, K., Rennie, J., and Seymore, K. 2000. Automating the construction of internet portals with machine learning. Inf. Retri. J. 3, 2, 127--163.
[31]
Mihalcea, R. and Tarau, P. 2005. A language independent algorithm for single and multiple document summarization. In Proceedings of IJCNLP.
[32]
Morinaga, S. and Yamanishi, K. 2004. Tracking dynamics of topic trends using a finite mixture model. In Proceedings of KDD’04. ACM, New York, 811--816.
[33]
Nenkova, A., Passonneau, R. J., and McKeown, K. 2007. The pyramid method: Incorporating human content selection variation in summarization evaluation. ACM Trans. Speech Lang. Proc. 4, 2.
[34]
Ning, H., Xu, W., Chi, Y., Gong, Y., and Huang, T. S. 2007. Incremental spectral clustering with application to monitoring of evolving blog communities. In Proceedings of the SIAM Data Mining Conference.
[35]
Ou, S., Khoo, C., and Goh, D. 2007. Multi-document summarization focusing on extracting and integrating similarities and differences among documents. In Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP’07). 442--446.
[36]
Paul, M. J., Zhai, C., and Girju, R. 2010. Summarizing contrastive viewpoints in opinionated text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’10). Association for Computational Linguistics, Stroudsburg, PA, 66--76.
[37]
Petersen, K. B. and Pedersen, M. S. 2006. The matrix cookbook. Version 20051003.
[38]
Radev, D., Jing, H., Stys, M., and Tam, D. 2004. Centroid-based summarization of multiple documents. Inf. Process. Manage. 40, 6, 919--938.
[39]
Shen, C. and Li, T. 2010. Multi-document summarization via the minimum dominating set. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 984--992.
[40]
Shen, D., Sun, J.-T., Li, H., Yang, Q., and Chen, Z. 2007. Document summarization using conditional random fields. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI’07). 2862--2867.
[41]
Wan, X. and Yang, J. 2008. Multi-document summarization using cluster-based link analysis. In Proceedings of the 31st Annual International SIGIR Conference.
[42]
Wang, D. and Li, T. 2010. Many are better than one: Improving multi-document summarization via weighted consensus. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). 809--810.
[43]
Wang, D., Li, T., Zhu, S., and Ding, C. 2008. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). 307--314.
[44]
Wang, D., Zhu, S., Li, T., and Gong, Y. 2009a. Comparative document summarization via discriminative sentence selection. In Proceeding of the 18th ACM Conference on Information and Knowledge Management (CIKM’09). 1963--1966.
[45]
Wang, D., Zhu, S., Li, T., and Gong, Y. 2009b. Multi-document summarization using sentence-based topic models. In Proceedings of the ACL-IJCNLP Conference (Short Papers). 297--300.
[46]
Yang, Y., Pierce, T., and Carbonell, J. 1998. A study of retrospective and on-line event detection. In Proceedings of SIGIR’98. ACM, New York, 28--36.
[47]
Yu, K., Bi, J., and Tresp, V. 2006. Active learning via transductive experimental design. In Proceedings of ICML.
[48]
Zhai, C., Velivelli, A., and Yu, B. 2004. A cross-collection mixture model for comparative text mining. In Proceedings of SIGKDD.
[49]
Zhang, K., Zi, J., and Wu, L. G. 2007. New event detection based on indexing-tree and named entity. In Proceedings of SIGIR ’07. ACM, New York, 215--222.
[50]
Zhang, Y., Callan, J., and Minka, T. 2002. Novelty and redundancy detection in adaptive filtering. In Proceedings of SIGIR’02. ACM, New York, 81--88.
[51]
Zhao, Q., Mitra, P., and Chen, B. 2007. Temporal and information flow based event detection from social text streams. In Proceedings of the 22nd National Conference on Artificial Intelligence. Vol. 2. AAAI Press, 1501--1506.
[52]
Zhu, S., Wang, D., Yu, K., Li, T., and Gong, Y. 2010. Feature selection for gene expression using model-based entropy. IEEE/ACM Trans. Comput. Biol. Bioinfo. 7, 1, 25--36.

Cited By

View all
  • (2022)Mining multi-brand characteristics from online reviews for competitive analysisElectronic Commerce Research and Applications10.1016/j.elerap.2022.10114153:COnline publication date: 1-May-2022
  • (2020)Enhance Content Selection for Multi-Document Summarization with Entailment Relation2020 International Conference on Technologies and Applications of Artificial Intelligence (TAAI)10.1109/TAAI51410.2020.00030(119-124)Online publication date: Dec-2020
  • (2018)Interactive Discovery of Coordinated Relationship Chains with Maximum Entropy ModelsACM Transactions on Knowledge Discovery from Data10.1145/304701712:1(1-34)Online publication date: 31-Jan-2018
  • Show More Cited By

Index Terms

  1. Comparative Document Summarization via Discriminative Sentence Selection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 7, Issue 1
      March 2013
      122 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2435209
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 March 2013
      Accepted: 01 April 2012
      Revised: 01 April 2012
      Received: 01 June 2011
      Published in TKDD Volume 7, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Comparative document summarization
      2. discriminative sentence selection

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)11
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 13 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Mining multi-brand characteristics from online reviews for competitive analysisElectronic Commerce Research and Applications10.1016/j.elerap.2022.10114153:COnline publication date: 1-May-2022
      • (2020)Enhance Content Selection for Multi-Document Summarization with Entailment Relation2020 International Conference on Technologies and Applications of Artificial Intelligence (TAAI)10.1109/TAAI51410.2020.00030(119-124)Online publication date: Dec-2020
      • (2018)Interactive Discovery of Coordinated Relationship Chains with Maximum Entropy ModelsACM Transactions on Knowledge Discovery from Data10.1145/304701712:1(1-34)Online publication date: 31-Jan-2018
      • (2017)R3ACM Transactions on Embedded Computing Systems10.1145/307072017:1(1-25)Online publication date: 20-Sep-2017
      • (2017)A Dual-Domain Perceptual Framework for Generating Visual Inconspicuous CounterpartsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/306842713:2(1-21)Online publication date: 26-Apr-2017
      • (2017)A Temporal Order Modeling Approach to Human Action Recognition from Multimodal Sensor DataACM Transactions on Multimedia Computing, Communications, and Applications10.1145/303891713:2(1-22)Online publication date: 6-Mar-2017
      • (2017)Comparative Document Analysis for Large Text CorporaProceedings of the Tenth ACM International Conference on Web Search and Data Mining10.1145/3018661.3018690(325-334)Online publication date: 2-Feb-2017
      • (2017)A review of traditional and swarm search based feature selection algorithms for handling data stream classification2017 Third International Conference on Sensing, Signal Processing and Security (ICSSS)10.1109/SSPS.2017.8071650(514-520)Online publication date: May-2017
      • (2017)Recent advances in document summarizationKnowledge and Information Systems10.1007/s10115-017-1042-453:2(297-336)Online publication date: 1-Nov-2017
      • (2016)Burst photography for high dynamic range and low-light imaging on mobile camerasACM Transactions on Graphics10.1145/2980179.298025435:6(1-12)Online publication date: 5-Dec-2016
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media