A Sequential Model for Discourse Segmentation

Hernault, Hugo; Bollegala, Danushka; Ishizuka, Mitsuru

doi:10.1007/978-3-642-12116-6_26

Hugo Hernault¹⁷,
Danushka Bollegala¹⁷ &
Mitsuru Ishizuka¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6008))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1903 Accesses

Abstract

Identifying discourse relations in a text is essential for various tasks in Natural Language Processing, such as automatic text summarization, question-answering, and dialogue generation. The first step of this process is segmenting a text into elementary units. In this paper, we present a novel model of discourse segmentation based on sequential data labeling. Namely, we use Conditional Random Fields to train a discourse segmenter on the RST Discourse Treebank, using a set of lexical and syntactic features. Our system is compared to other statistical and rule-based segmenters, including one based on Support Vector Machines. Experimental results indicate that our sequential model outperforms current state-of-the-art discourse segmenters, with an F-score of 0.94. This performance level is close to the human agreement F-score of 0.98.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A survey of discourse parsing

Article 20 January 2022

Labeling Explicit Discourse Relations Using Pre-trained Language Models

Syntax-Guided Sequence to Sequence Modeling for Discourse Segmentation

References

Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)
MATH Google Scholar
Chai, J.Y., Jin, R.: Discourse structure for context question answering. In: Harabagiu, S., Lacatusu, F. (eds.) HLT-NAACL 2004: Workshop on Pragmatics of Question Answering, Boston, Massachusetts, USA, pp. 23–30. Association for Computational Linguistics (2004)
Google Scholar
Hernault, H., Piwek, P., Prendinger, H., Ishizuka, M.: Generating dialogues for virtual agents using nested textual coherence relations. In: Prendinger, H., Lester, J.C., Ishizuka, M. (eds.) IVA 2008. LNCS (LNAI), vol. 5208, pp. 139–145. Springer, Heidelberg (2008)
Chapter Google Scholar
Georg, G., Hernault, H., Cavazza, M., Prendinger, H., Ishizuka, M.: From rhetorical structures to document structure: shallow pragmatic analysis for document engineering. In: DocEng 2009, pp. 185–192. ACM, New York (2009)
Chapter Google Scholar
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: Toward a functional theory of text organization. Text 8, 243–281 (1988)
Google Scholar
du Verle, D., Prendinger, H.: A novel discourse parser based on support vector machine classification. In: ACL 2009, Suntec, Singapore, pp. 665–673. Association for Computational Linguistics (2009)
Google Scholar
Soricut, R., Marcu, D.: Sentence level discourse parsing using syntactic and lexical information. In: NAACL 2003, Morristown, NJ, USA, pp. 149–156. Association for Computational Linguistics (2003)
Google Scholar
Vapnik, V.N.: The nature of statistical learning theory. Springer, New York (1995)
MATH Google Scholar
Carlson, L., Marcu, D., Okurowski, M.E.: Rst discourse treebank (2002)
Google Scholar
Subba, R., Di Eugenio, B.: Automatic discourse segmentation using neural networks. In: Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue, Trento, Italy, pp. 189–190 (2007)
Google Scholar
Le, H.T., Abeysinghe, G., Huyck, C.: Automated discourse segmentation by syntactic information and cue phrases. In: AIA 2004, Innsbruck, Austria (2004)
Google Scholar
Tofiloski, M., Brooke, J., Taboada, M.: A syntactic and lexical-based discourse segmenter. In: ACL 2009, Suntec, Singapore, pp. 77–80. Association for Computational Linguistics (2009)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Google Scholar
Okazaki, N.: Crfsuite: a fast implementation of conditional random fields, crfs (2007)
Google Scholar
Ng, A.Y.: Feature selection, l1 vs. l2 regularization, and rotational invariance. In: ICML 2004, p. 78. ACM, New York (2004)
Chapter Google Scholar
Magerman, D.M.: Statistical decision-tree models for parsing. In: ACL 1995, Morristown, NJ, USA, pp. 276–283. Association for Computational Linguistics (1995)
Google Scholar
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19, 313–330 (1993)
Google Scholar
Charniak, E.: A maximum-entropy-inspired parser. In: NAACL 2000, pp. 132–139. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Google Scholar
Klein, D., Manning, C.D.: Fast exact inference with a factored model for natural language parsing. In: Advances in Neural Information Processing Systems, vol. 15. MIT Press, Cambridge (2003)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
Hugo Hernault, Danushka Bollegala & Mitsuru Ishizuka

Authors

Hugo Hernault
View author publications
You can also search for this author in PubMed Google Scholar
Danushka Bollegala
View author publications
You can also search for this author in PubMed Google Scholar
Mitsuru Ishizuka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, 07738, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hernault, H., Bollegala, D., Ishizuka, M. (2010). A Sequential Model for Discourse Segmentation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-12116-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12115-9
Online ISBN: 978-3-642-12116-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Sequential Model for Discourse Segmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A survey of discourse parsing

Labeling Explicit Discourse Relations Using Pre-trained Language Models

Syntax-Guided Sequence to Sequence Modeling for Discourse Segmentation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Sequential Model for Discourse Segmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A survey of discourse parsing

Labeling Explicit Discourse Relations Using Pre-trained Language Models

Syntax-Guided Sequence to Sequence Modeling for Discourse Segmentation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation