research-article

Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

Authors:

Jianguo XiaoAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 28, Issue 2

Article No.: 8, Pages 1 - 34

https://doi.org/10.1145/1740592.1740596

Published: 10 June 2010 Publication History

Abstract

Document summarization and keyphrase extraction are two related tasks in the IR and NLP fields, and both of them aim at extracting condensed representations from a single text document. Existing methods for single document summarization and keyphrase extraction usually make use of only the information contained in the specified document. This article proposes using a small number of nearest neighbor documents to improve document summarization and keyphrase extraction for the specified document, under the assumption that the neighbor documents could provide additional knowledge and more clues. The specified document is expanded to a small document set by adding a few neighbor documents close to the document, and the graph-based ranking algorithm is then applied on the expanded document set to make use of both the local information in the specified document and the global information in the neighbor documents. Experimental results on the Document Understanding Conference (DUC) benchmark datasets demonstrate the effectiveness and robustness of our proposed approaches. The cross-document sentence relationships in the expanded document set are validated to be beneficial to single document summarization, and the word cooccurrence relationships in the neighbor documents are validated to be very helpful to single document keyphrase extraction.

References

[1]

Amini, M. R. and Gallinari, P. 2002. The use of unlabeled data to improve supervised learning for text summarization. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 105--112.

Digital Library

[2]

Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrival. ACM Press/Addison Wesley.

Digital Library

[3]

Balabanović, M. and Shoham, Y. 1997. Fab: content-based, collaborative recommendation. Comm. ACM 40, 3, 66--72.

Digital Library

[4]

Barker, K. and Cornacchia, N. 2000. Using nounphrase heads to extract document keyphrases. In Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence. 40--52.

Digital Library

[5]

Barzilay, R. and Elhadad, M. 1997. Using lexical chains for text summarization. In Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization. 10--17.

[6]

Berger, A. and Mittal, V. 2000. OCELOT: A system for summarizing Web Pages. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and development in Information Retrieval (SIGIR). 144--151.

Digital Library

[7]

Böhm, C. and Berchtold, S. 2001. Searching in high-dimensional spaces-index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33, 3, 322--373.

Digital Library

[8]

Carbonell, J. and Goldstein, J. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 335--336.

Digital Library

[9]

Carenini, G., Ng, R. T., and Zhou, X. 2007. Summarizing email conversations with clue words. In Proceedings of the 16th International Conference on World Wide Web. 91--100.

Digital Library

[10]

Conroy, J. M. and O'Leary, D. P. 2001. Text summarization via Hidden Markov Models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 406--407.

Digital Library

[11]

Daumé, H. and Marcu, D. 2006. Bayesian query-focused summarization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL). 305--312.

Digital Library

[12]

Edmundson, H. P. 1969. New methods in automatic abstracting. J. ACM 16, 2, 264--285.

Digital Library

[13]

Erkan, G. and Radev, D. R. 2004. LexRank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457--479.

[14]

Frank, E., Paynter, G. W., Witten, I. H., Gutwin, C., and Nevill-Manning, C. G. 1999. Domain-specific keyphrase extraction. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI). 668--673.

Digital Library

[15]

Gong, Y. H. and Liu, X. 2001. Generic text summarization using Relevance Measure and Latent Semantic Analysis. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 19--25.

Digital Library

[16]

Gutwin, C., Paynter, G. W., Witten, I. H., Nevill-Manning, C. G., and Frank, E. 1999. Improving browsing in digital libraries with keyphrase indexes. J. Dec. Support Syst. 27, 81--104.

Digital Library

[17]

Hammouda, K. M., Matute, D. N., and Kamel, M. S. 2005. CorePhrase: keyphrase extraction for document clustering. In Proceedings of IAPR 4th International Conference on Machine Learning and Data Mining (MLDM). 265--274.

Digital Library

[18]

Harabagiu, S. and Lacatusu, F. 2005. Topic themes for multidocument summarization. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 202--209.

Digital Library

[19]

Hovy, E. and Lin, C. Y. 1997. Automated text summarization in SUMMARIST. In Proceedings of ACL/EACL Worshop on Intelligent Scalable Text Summarization. 18--24.

[20]

Hulth, A. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 216--223.

Digital Library

[21]

Jing, H. 2000. Sentence reduction for automatic text summarization. In Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP). 310--315.

Digital Library

[22]

Jing, H. and McKeown, K. R. 2000. Cut and paste based text summarization. In Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference (NAACL). 178--185.

Digital Library

[23]

Kelleher, D. and Luz, S. 2005. Automatic hypertext keyphrase detection. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI). 1608--1609.

Digital Library

[24]

Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632.

Digital Library

[25]

Knight, K. and Marcu, D. 2002. Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artif. Intell. 139, 1, 91--107.

Digital Library

[26]

Kolcz, A., Prabakarmurthi, V., and Kalita, J. 2001. Summarization as feature selection for text categorization. In Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM). 365--370.

Digital Library

[27]

Krulwich, B. and Burkey, C. 1996. Learning user information interests through the extraction of semantically significant phrases. In Spring Symposium on Machine Learning in Information Access (AAAI). 110--112.

[28]

Kupiec, J., Pedersen, J., and Chen, F. 1995. A.trainable document summarizer. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 68--73.

Digital Library

[29]

Lam-Adesina, A. M. and Jones, G. J. F. 2001. Applying summarization techniques for term selection in relevance feedback. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 1--9.

Digital Library

[30]

Lin, C. Y. and Hovy, E. 2000. The automated acquisition of topic signatures for text summarization. In Proceedings of the 18th Conference on Computational Linguistics (ACL). 495--501.

Digital Library

[31]

Lin, C. Y. and Hovy, E. H. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL-HLT). 71--78.

Digital Library

[32]

Luhn, H. P. 1958. The automatic creation of literature abstracts. IBM J. Res. Devel. 2, 2, 159--165.

Digital Library

[33]

McDonald, D. and Chen, H. 2002. Using sentence-selection heuristics to rank text segment in TXTRACTOR. In Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL). 28--35.

Digital Library

[34]

McDonald, D. and Chen, H. 2006. Summary in context: searching versus browsing. ACM Trans. Inform. Syst. 24, 1, 111--141.

Digital Library

[35]

Medelyan, O. and Witten, I. H. 2006. Thesaurus based automatic keyphrase indexing. In Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL). 296--297.

Digital Library

[36]

Mihalcea, R. and Tarau, P. 2004. TextRank: Bringing order into texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 404--411.

[37]

Mihalcea, R. and Tarau, P. 2005. A language independent algorithm for single and multiple document summarization. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP): Companion Volume including Posters/Demos and Tutorial Abstracts. 19--24.

[38]

Mihalcea, R. and Ceylan, H. 2007. Explorations in automatic book summarization. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CONLL). 380--389.

[39]

Muñoz, A. 1997. Compound key word generation from document databases using a hierarchical clustering ART model. Intell. Data Anal. 1, 1--4, 25--48.

[40]

Nguyen, T. D. and Kan, M.-Y. 2007. Keyphrase extraction in scientific publications. In Proceedings of the 10th International Conference on Asian Digital Libraries (ICADL). 317--326.

Digital Library

[41]

Nomoto, T. and Matsumoto, Y. 2001. A new approach to unsupervised text summarization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 26--34.

Digital Library

[42]

Over, P. 2001. Introduction to DUC-2001: an intrinsic evaluation of generic news text summarization systems. In Proceedings of the DUC'01 Workshop on Text Summarization.

[43]

Over, P. and Liggett, W. 2002. Introduction to DUC: an intrinsic evaluation of generic news text summarization systems. In Proceedings of the DUC'02 Workshop on Text Summarization.

[44]

Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The pagerank citation ranking: Bringing order to the web. Tech. Rep., Stanford Digital Libraries.

[45]

Park, Y., Byrd, R. J., and Boguraev, B. 2002. Automatic glossary extraction: beyond terminology identification. In Proceedings of the 19th International Conference on Computational Linguistics. 1--7.

Digital Library

[46]

Porter, M. F. 1980. An algorithm for suffix stripping. Program 14, 3, 130--137.

[47]

Pouliquen, B., Steinberger, R., and Ignat, C. 2003. Automatic annotation of multilingual text collections with a conceptual thesaurus. In Proceedings of the Workshop ‘Ontologies and Information Extraction’ at the Summer School ‘The Semantic Web and Language Technology - Its Potential and Practicalities’ (EUROLAN). 9--28.

[48]

Radev, D. R., Jing, H. Y., Stys, M., and Tam, D. 2004. Centroid-based summarization of multiple documents. Inform. Proc. Manag. 40, 6, 919--938.

Digital Library

[49]

Radev, D. R. and McKeown, K. R. 1998. Generating natural language summaries from multiple on-line sources. Comput. Ling. 24, 3, 469--500.

Digital Library

[50]

Sakai, T. and Jones, K. S. 2001. Generic summaries for indexing in information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 190--198.

Digital Library

[51]

Shen, D., Chen, Z., Yang, Q., Zeng, H.-J., Zhang, B., Lu, Y., and Ma, W.-Y. 2004. Web-page classification through summarization. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 242--249.

Digital Library

[52]

Shen, D., Sun, J.-T., Li, H., Yang, Q., and Chen, Z. 2007. Document Summarization using conditional random fields. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI). 2862--2867.

Digital Library

[53]

Silber, H. G. and McCoy, K. 2000. Efficient text summarization using lexical chains. In Proceedings of the 5th International Conference on Intelligent User Interfaces. 252--255.

Digital Library

[54]

Song, M., Song, I.-Y., and Hu, X. 2003. KPSpotter: a flexible information gain-based keyphrase extraction system. In Proceedings of the 5th ACM International Workshop on Web Information and Data Management (WIDM), 50--53.

Digital Library

[55]

Steier, A. M. and Belew, R. K. 1993. Exporting phrases: A statistical analysis of topical language. In Proceedings of the Second Symposium on Document Analysis and Information Retrieval. 179--190.

[56]

Sun, J.-T., Shen, D., Zeng, H.-J., Yang, Q., Lu, Y., and Chen, Z. 2005. Web-page summarization using clickthrough data. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 194--201.

Digital Library

[57]

Tao, T., Wang, X., Mei, Q., and Zhai, C. 2006. Language model information retrieval with document expansion. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL). 407--414.

Digital Library

[58]

Teufel, S. and Moens, M. 2002. Summarizing scientific articles: experiments with relevance and rhetorical status. Comput. Ling. 28, 4, 409--445.

Digital Library

[59]

Tomokiyo, T. and Hurst, M. 2003. A language model approach to keyphrase extraction. In Proceedings of the ACL Workshop on Multiword Expressions: Analysis, Acquisition and Treatment. 33--40.

Digital Library

[60]

Toutanova, K. and Manning, C. D. 2000. Enriching the knowledge sources used in a maximum entropy Part-of-Speech tagger. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC). 63--70.

Digital Library

[61]

Turney, P. D. 2000. Learning algorithms for keyphrase extraction. Inform. Retrieval 2, 4, 303--336.

Digital Library

[62]

Turney, P. D. 2003. Coherent keyphrase extraction via web mining. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI). 434--439.

Digital Library

[63]

Wan, X. and Xiao, J. 2008a. Single document keyphrase extraction using neighborhood knowledge. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI). 855--860.

Digital Library

[64]

Wan, X. and Xiao, J. 2008b. CollabRank: Towards a collaborative approach to single-document keyphrase extraction. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING). 969--976.

Digital Library

[65]

Wan, X. and Yang, J. 2007a. Single document summarization with document expansion. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence (AAAI). 931--936.

Digital Library

[66]

Wan, X. and Yang, J. 2007b. CollabSum: Exploiting multiple document clustering for collaborative single document summarizations. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 143--150.

Digital Library

[67]

Wan, X., Yang, J., and Xiao, J. 2007. Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL). 552--559.

[68]

Wang, X., Shen, D., Zeng, H.-J., Chen, Z., and Ma, W.-Y. 2004. Web page clustering enhanced by summarization. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM). 242--243.

Digital Library

[69]

Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., and Nevill-Manning, C. G. 1999. KEA: Practical automatic keyphrase extraction. In Proceedings of the 4th ACM Conference on Digital Libraries (DL). 254--256.

Digital Library

[70]

Wong, T.-L., Lam, W., and Chan, S.-K. 2006. Collaborative information extraction and mining from multiple web documents. In Proceedings of the SIAM International Conference on Data Mining (SDM). 440--450.

[71]

Xue, G.-R., Lin, C., Yang, Q., Xi, W., Zeng, H.-J., Yu, Y., and Chen, Z. 2005. Scalable collaborative filtering using cluster-based smoothing. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 114--121.

Digital Library

[72]

Yih, W.-T., Goodman, J., and Carvalho, V. R. 2006. Finding advertising keywords on web pages. In Proceedings of the 15th International Conference on World Wide Web (WWW). 213--222.

Digital Library

[73]

Zha, H. Y. 2002. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 113--120.

Digital Library

[74]

Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., and Ma, W.-Y. 2005. Improving web search results using affinity graph. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 504--511.

Digital Library

Cited By

Su WJiang JHuang K(2023)Multi-granularity adaptive extractive document summarization with heterogeneous graph neural networksPeerJ Computer Science10.7717/peerj-cs.17379(e1737)Online publication date: 13-Dec-2023
https://doi.org/10.7717/peerj-cs.1737
Saklecha AUplavdiya PChawla P(2023)An Extensive Survey on Investigation Methodologies for Text SummarizationIndian Journal of Signal Processing10.54105/ijsp.D1016.1134233:4(1-6)Online publication date: 30-Nov-2023
https://doi.org/10.54105/ijsp.D1016.113423
Chowdhury SSarkar K(2023)A New Method for Extractive Text Summarization Using Neural NetworksSN Computer Science10.1007/s42979-023-01806-04:4Online publication date: 9-May-2023
https://dl.acm.org/doi/10.1007/s42979-023-01806-0
Show More Cited By

Index Terms

Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

Recommendations

Automatic keyphrase extraction for Arabic news documents based on KEA system

A keyphrase is a sequence of words that play an important role in the identification of the topics that are embedded in a given document. Keyphrase extraction is a process which extracts such phrases. This has many important applications such as document ...
Single document keyphrase extraction using neighborhood knowledge
AAAI'08: Proceedings of the 23rd national conference on Artificial intelligence - Volume 2

Existing methods for single document keyphrase extraction usually make use of only the information contained in the specified document. This paper proposes to use a small number of nearest neighbor documents to provide more knowledge to improve single ...
Domain-specific keyphrase extraction
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Document keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 28, Issue 2

May 2010

165 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/1740592

Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2010

Accepted: 01 April 2009

Revised: 01 November 2008

Received: 01 May 2008

Published in TOIS Volume 28, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

58
Total Citations
View Citations
1,797
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Su WJiang JHuang K(2023)Multi-granularity adaptive extractive document summarization with heterogeneous graph neural networksPeerJ Computer Science10.7717/peerj-cs.17379(e1737)Online publication date: 13-Dec-2023
https://doi.org/10.7717/peerj-cs.1737
Saklecha AUplavdiya PChawla P(2023)An Extensive Survey on Investigation Methodologies for Text SummarizationIndian Journal of Signal Processing10.54105/ijsp.D1016.1134233:4(1-6)Online publication date: 30-Nov-2023
https://doi.org/10.54105/ijsp.D1016.113423
Chowdhury SSarkar K(2023)A New Method for Extractive Text Summarization Using Neural NetworksSN Computer Science10.1007/s42979-023-01806-04:4Online publication date: 9-May-2023
https://dl.acm.org/doi/10.1007/s42979-023-01806-0
Anan SIslam NAli MBhuiyan TBijoy MReza AArefin M(2023)Automatic Document Summarization of Unilingual Documents: A ReviewIntelligent Computing and Optimization10.1007/978-3-031-50327-6_36(345-358)Online publication date: 16-Dec-2023
https://doi.org/10.1007/978-3-031-50327-6_36
Mao XHuang SShen LLi RYang H(2021)Single document summarization using the information from documents with the same topicKnowledge-Based Systems10.1016/j.knosys.2021.107265228(107265)Online publication date: Sep-2021
https://doi.org/10.1016/j.knosys.2021.107265
Zhang HWang J(2020)An unsupervised semantic sentence ranking scheme for text documentsIntegrated Computer-Aided Engineering10.3233/ICA-20062628:1(17-33)Online publication date: 21-Dec-2020
https://doi.org/10.3233/ICA-200626
DAI YYOSHIKAWA MASANO Y(2020)Estimating Knowledge Category Coverage by Courses Based on Centrality in TaxonomyIEICE Transactions on Information and Systems10.1587/transinf.2019DAP0002E103.D:5(928-938)Online publication date: 1-May-2020
https://doi.org/10.1587/transinf.2019DAP0002
Sarkar KChowdhury S(2020)A Novel Sentence Scoring Method for Extractive Text SummarizationProceedings of International Conference on Frontiers in Computing and Systems10.1007/978-981-15-7834-2_16(169-179)Online publication date: 24-Nov-2020
https://doi.org/10.1007/978-981-15-7834-2_16
Rahamat Basha SKeziya Rani JPrasad Yadav J(2019)A Novel Summarization-based Approach for Feature Reduction Enhancing Text Classification AccuracyEngineering, Technology & Applied Science Research10.48084/etasr.31739:6(5001-5005)Online publication date: 1-Dec-2019
https://doi.org/10.48084/etasr.3173
Rautray RBalabantaray RDash RDash R(2019)CSMDSE-Cuckoo Search Based Multi Document Summary ExtractorInternational Journal of Cognitive Informatics and Natural Intelligence10.4018/IJCINI.201910010313:4(56-70)Online publication date: 1-Oct-2019
https://dl.acm.org/doi/10.4018/IJCINI.2019100103
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents