research-article

Deep Dependency Substructure-Based Learning for Multidocument Summarization

Authors:

Xiaojun WanAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 34, Issue 1

Article No.: 3, Pages 1 - 24

https://doi.org/10.1145/2766447

Published: 14 July 2015 Publication History

Abstract

Most extractive style topic-focused multidocument summarization systems generate a summary by ranking textual units in multiple documents and extracting a proper subset of sentences biased to the given topic. Usually, the textual units are simply represented as sentences or n-grams, which do not carry deep syntactic and semantic information. This article presents a novel extractive topic-focused multidocument summarization framework. The framework proposes a new kind of more meaningful and informative units named frequent Deep Dependency Sub-Structure (DDSS) and a topic-sensitive Multi-Task Learning (MTL) model for frequent DDSS ranking. Given a document set, first, we parse all the sentences into deep dependency structures with a Head-driven Phrase Structure Grammar (HPSG) parser and mine the frequent DDSSs after semantic normalization. Then we employ a topic-sensitive MTL model to learn the importance of these frequent DDSSs. Finally, we exploit an Integer Linear Programming (ILP) formulation and use the frequent DDSSs as the essentials for summary extraction. Experimental results on two DUC datasets demonstrate that our proposed approach can achieve state-of-the-art performance. Both the DDSS information and the topic-sensitive MTL model are validated to be very helpful for topic-focused multidocument summarization.

References

[1]

C. Aksoy, A. Bugdayci, T. Gur, I. Uysal, and F. Can. 2009. Semantic argument frequency-based multidocument summarization. In Proceedings of the 24th International Symposium on IEEE. 460--464.

[2]

E. Aktolga, J. Allan, and D. A. Smith. 2011. Passage reranking for question answering using syntactic structures and answer types. Advances in Information Retrieval. 617--628.

Digital Library

[3]

R. K. Ando and T. Zhang. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. The Journal of Machine Learning Research 1817--1853.

Digital Library

[4]

J. Bresnan. 1982. The Mental Representation of Grammatical Relations. MIT Press.

[5]

B. Carpenter. 1992. The Logic of Typed Feature Structures. Cambridge University Press.

Digital Library

[6]

A. Celikyilmaz and D. Hakkani-Tür. 2011. Discovery of topically coherent sentences for extractive summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT-ACL’11). 491--499.

Digital Library

[7]

J. Chen, L. Tang, J. Liu, and J. Ye. 2009. A convex formulation for learning shared structures from multiple tasks. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09). 137--144.

Digital Library

[8]

H. Cui, R. Sun, K. Li, M. Y. Kan, and T. S. Chua. 2005. Question answering passage retrieval using dependency relations. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’05). 400--407.

Digital Library

[9]

E. Filatova and V. Hatzivassiloglou. 2004. A formal model for information selection in multi-sentence text extraction. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04). 397.

Digital Library

[10]

D. Gildea and D. Jurafsky. 2002. Automatic labeling of semantic roles. Computational Linguistics 28, 3, 245--288.

Digital Library

[11]

D. Gillick and B. Favre. 2009. A scalable global model for summarization. In Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing (NLP’09). 10--18.

Digital Library

[12]

T. Hirao, H. Isozaki, E. Maeda, and Y. Matsumoto. 2002. Extracting important sentences with support vector machines. In Proceedings of the 19th International Conference on Computational linguistics (COLING’02). 1--7.

Digital Library

[13]

T. Hirao, J. Suzuki, H. Isozaki, and E. Maeda. 2004. Dependency-based sentence alignment for multiple document summarization. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04). 446.

Digital Library

[14]

C. Hori and S. Furui. 2003. A new approach to automatic speech summarization. IEEE Transactions on Multimedia 5, 3, 368--378.

Digital Library

[15]

A. Inokuchi, T. Washio, and H. Motoda. 2000. An apriori-based algorithm for mining frequent substructures from graph data. In Data Mining and Knowledge Discovery. 13--23.

Digital Library

[16]

H. Ji, B. Favre, W. P. Lin, D. Gillick, D. Hakkani-Tür, and R. Grishman. 2013. Open-domain multidocument summarization via information extraction: Challenges and prospects. In Multi-source, Multilingual Information Extraction and Summarization. 177--201.

[17]

J. Kupiec, J. Pedersen, and F. Chen. 1995. A trainable document summarizer. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’95). 68--73.

Digital Library

[18]

M. Kuramochi and G. Karypis. 2001. Frequent subgraph discovery. In Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM’01). 313--320.

Digital Library

[19]

J. Leskovec, M. Grobelnik, and N. Milic-Frayling. 2004. Learning sub-structures of document semantic graphs for document summarization. In Proceedings of the 2004 SIGKDD Workshop on Link Discover (LinkKDD’04).

[20]

R. Levy and C. D. Manning. 2004. Deep dependencies from context-free statistical parsers: Correcting the surface dependency approximation. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL’04). 327.

Digital Library

[21]

C. Li, F. Liu, F. Weng, and Y. Liu. 2013. Document summarization via guided sentence compression. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 490--500.

[22]

C. Y. Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Proceedings of the ACL Workshop on Text Summarization Branches Out. 74--81.

[23]

C. Y. Lin and E. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL’03). 71--78.

Digital Library

[24]

H. Lin and J. A. Bilmes. 2012. Learning mixtures of submodular shells with application to document summarization. arXiv preprint arXiv, 1210.4871.

[25]

M. Litvak, M. Last, and M. Friedman. 2010. A new approach to improving multilingual summarization using a genetic algorithm. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). 927--936.

Digital Library

[26]

Y. Liu, S. H. Zhong, and W. Li. 2012. Query-oriented multidocument summarization via unsupervised deep learning. In Proceedings of the 26th Association for the Advancement of Artificial Intelligence (AAAI’12). 1699--1705.

[27]

U. Mirchev and M. Last. 2014. Multidocument summarization by extended graph text representation and importance refinement. In Innovative Document Summarization Techniques: Revolutionizing Knowledge Understanding. 28.

[28]

Y. Ouyang, S. Li, and W. Li. 2007, November. Developing learning strategies for topic-based summarization. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). 79--86.

Digital Library

[29]

Y. Ouyang, W. Li, S. Li, and Q. Lu. 2011. Applying regression models to query-focused multidocument summarization. Information Processing & Management 47, 2, 227--237.

Digital Library

[30]

M. A. Pasca and S. M. Harabagiu. 2001. High performance question/answering. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01). 366--374.

Digital Library

[31]

C. Pollard. 1994. Head-driven phrase structure grammar. University of Chicago Press.

[32]

D. R. Radev, H. Jing, M. Stys, and D. Tam. 2004. Centroid-based summarization of multiple documents. Information Processing & Management 40, 6, 919--938.

Digital Library

[33]

D. Ravichandran and E. Hovy. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of Association for Computational Linguistics (ACL’02). 41--47.

Digital Library

[34]

S. Ross, J. Zhou, Y. Yue, D. Dey, and J. A. Bagnell. 2013. Learning policies for contextual submodular prediction. arXiv preprint arXiv, 1305.2532.

[35]

Y. Schabes, A. Abeille, and A. K. Joshi. 1988. Parsing strategies with lexicalized grammars: Application to tree adjoining grammars. In Proceedings of the 12th International Conference on Computational Linguistics (COLING’88). 578--583.

Digital Library

[36]

F. Schilder and R. Kondadadi. 2008. FastSum: Fast and accurate query-based multidocument summarization. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (HLT-AACL’08). 205--208.

Digital Library

[37]

C. Shen, T. Li, and C. H. Ding. 2011, April. Integrating clustering and multidocument summarization by bi-mixture probabilistic latent semantic analysis (PLSA) with sentence bases. In Proceedings of the 25th Association for the Advancement of Artificial Intelligence (AAAI’11).

[38]

D. Shen, J. T. Sun, H. Li, Q. Yang, and Z. Chen. 2007, January. Document summarization using conditional random fields. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07). 2862--2867.

Digital Library

[39]

M. Steedman. 2000. The Syntactic Process. MIT Press, Cambridge, MA.

Digital Library

[40]

R. Sun, C. H. Ong, and T. S. Chua. 2006. Mining dependency relations for query expansion in passage retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06). 382--389.

Digital Library

[41]

H. Takamura and M. Okumura. 2009. Text summarization model based on maximum coverage problem and its variant. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (CECACL’09). 781--789.

Digital Library

[42]

J. Tang, L. Yao, and D. Chen. 2009. Multi-topic based query-oriented summarization. In Proceedings of the SIAM International Conference on Data Mining (SDM’09). 1147--1158.

[43]

X. Wan and J. Xiao. 2009. Graph-based multi-modality learning for topic-focused multidocument summarization. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI’09). 1586--1591.

Digital Library

[44]

D. Wang, T. Li, S. Zhu, and C. Ding. 2008. Multidocument summarization via sentence-level semantic analysis and symmetric matrix factorization. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). 307--314.

Digital Library

[45]

L. Wang, H. Raghavan, V. Castelli, R. Florian, and C. Cardie. 2013. A sentence compression based framework to query-focused multidocument summarization. In Proceedings of the 51st Annual Meeting of Association for Computational Linguistics (ACL’13). 1384--1394.

[46]

F. Wei, W. Li, Q. Lu, and Y. He. 2010. A document-sensitive graph model for multidocument summarization. Knowledge and Information Systems 22, 2, 245--259.

Digital Library

[47]

K. F. Wong, M. Wu, and W. Li. 2008. Extractive summarization using supervised and semi-supervised learning. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING’08). 985--992.

Digital Library

[48]

S. Yan and X. Wan. 2014. SRRank: Leveraging semantic roles for extractive multidocument summarization. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, 12, 2048--2058.

Digital Library

[49]

X. Yan and J. Han. 2002. gspan: Graph-based substructure pattern mining. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02). 721--724.

Digital Library

[50]

D. Zajic, B. J. Dorr, J. Lin, and R. Schwartz. 2007. Multi-candidate reduction: Sentence compression as a tool for document summarization tasks. Information Processing & Management 43, 6, 1549--1570.

Digital Library

[51]

L. Zhao, X. Huang, and L. Wu. 2005. Fudan university at DUC 2005. In Proceedings of Document Understanding Conference (DUC’05).

Cited By

Zhao SWu RTao JQu MZhao MFan CZhao H(2022)perCLTV: A General System for Personalized Customer Lifetime Value Prediction in Online GamesACM Transactions on Information Systems10.1145/353001241:1(1-29)Online publication date: 23-Apr-2022
https://dl.acm.org/doi/10.1145/3530012
Liu HGuo QZhu HZhuang FYang SDou DXiong H(2022)Who will Win the Data Science Competition? Insights from KDD Cup 2019 and BeyondACM Transactions on Knowledge Discovery from Data10.1145/351189616:5(1-24)Online publication date: 5-Apr-2022
https://dl.acm.org/doi/10.1145/3511896
Chootong CShih TOchirbat ASommool WZhuang Y(2021)An attention enhanced sentence feature network for subtitle extraction and summarizationExpert Systems with Applications10.1016/j.eswa.2021.114946178(114946)Online publication date: Sep-2021
https://doi.org/10.1016/j.eswa.2021.114946
Show More Cited By

Index Terms

Deep Dependency Substructure-Based Learning for Multidocument Summarization
1. Information systems
  1. Information retrieval
    1. Document representation
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Abstraction

Recommendations

Experiments in multidocument summarization
HLT '02: Proceedings of the second international conference on Human Language Technology Research

This paper describes a multidocument summarizer built upon research into the detection of new information. The summarizer uses several new strategies to select interesting and informative sentences, including an innovative measure of importance derived ...
Fractal summarization: summarization based on fractal theory
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

In this paper, we introduce the fractal summarization model based on the fractal theory. In fractal summarization, the important information is captured from the source text by exploring the hierarchical structure and salient features of the document. A ...
Comments-oriented document summarization: understanding documents with readers' feedback
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Comments left by readers on Web documents contain valuable information that can be utilized in different information retrieval tasks including document search, visualization, and summarization. In this paper, we study the problem of comments-oriented ...

Reviews

Reviewer: Fazli Can

Summarization aims to make the information content of a document efficiently accessible. In extractive summarization, sentences of a document are ranked by their importance and top-scoring sentences are selected according to a given summary length constraint. A similar approach is used in multidocument summarization; however, in this case, it is more probable to see the same information in several different surface forms. In focused summarization, the process is biased by the information need expressed in the form of a query. In this paper, the authors first introduce a concept called deep dependency substructure (DDSS) that aims to accurately reflect semantic relationships among words and then employ DDSSs for focused multidocument summarization. For DDSS ranking, the authors apply an approach based on integer linear programming by using not only frequent DDSSs, but also word bigrams; they pay special attention to the possible redundancy problem. In the experiments, several baselines and Document Understanding Conference (DUC) test collections are used. The experiments are comprehensive and the results are comparable to other state-of-the-art methods. The study provides a good new baseline and will be especially useful to researchers working on similar topics. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 34, Issue 1

October 2015

172 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/2806674

Editor:
Maarten de Rijke
University of Amsterdam, The Netherlands

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 July 2015

Accepted: 01 April 2015

Revised: 01 March 2015

Received: 01 November 2014

Published in TOIS Volume 34, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Hi-Tech Research and Development Program (863 Program) of China
National Natural Science Foundation of China
Beijing Nova Program

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
583
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao SWu RTao JQu MZhao MFan CZhao H(2022)perCLTV: A General System for Personalized Customer Lifetime Value Prediction in Online GamesACM Transactions on Information Systems10.1145/353001241:1(1-29)Online publication date: 23-Apr-2022
https://dl.acm.org/doi/10.1145/3530012
Liu HGuo QZhu HZhuang FYang SDou DXiong H(2022)Who will Win the Data Science Competition? Insights from KDD Cup 2019 and BeyondACM Transactions on Knowledge Discovery from Data10.1145/351189616:5(1-24)Online publication date: 5-Apr-2022
https://dl.acm.org/doi/10.1145/3511896
Chootong CShih TOchirbat ASommool WZhuang Y(2021)An attention enhanced sentence feature network for subtitle extraction and summarizationExpert Systems with Applications10.1016/j.eswa.2021.114946178(114946)Online publication date: Sep-2021
https://doi.org/10.1016/j.eswa.2021.114946
Chen XSong XRen RZhu LCheng ZNie L(2020)Fine-Grained Privacy Detection with Graph-Regularized Hierarchical Attentive Representation LearningACM Transactions on Information Systems10.1145/340610938:4(1-26)Online publication date: 16-Sep-2020
https://dl.acm.org/doi/10.1145/3406109
Feng XLiu MLiu JQin BSun YLiu T(2018)Topic-to-essay generation with neural networksProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304222.3304337(4078-4084)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304222.3304337
Ren PChen ZRen ZWei FNie LMa Jde Rijke M(2018)Sentence Relations for Extractive Summarization with Deep Neural NetworksACM Transactions on Information Systems10.1145/320086436:4(1-32)Online publication date: 30-Apr-2018
https://dl.acm.org/doi/10.1145/3200864
Ren PChen ZRen ZWei FMa Jde Rijke MKando NSakai TJoho HLi Hde Vries AWhite R(2017)Leveraging Contextual Sentence Relations for Extractive Summarization Using a Neural Attention ModelProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080792(95-104)Online publication date: 7-Aug-2017
https://dl.acm.org/doi/10.1145/3077136.3080792
Zhang WMa CYu MLiu CWang Y(2017)N-SVDD: A sensitive message analysis model for mobile forensics2017 IEEE Conference on Application, Information and Network Security (AINS)10.1109/AINS.2017.8270423(48-53)Online publication date: Nov-2017
https://doi.org/10.1109/AINS.2017.8270423

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents