Gold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data

Sanchan, Nattapong; Aker, Ahmet; Bontcheva, Kalina

doi:10.1007/978-3-319-77116-8_37

Nattapong Sanchan¹⁴,
Ahmet Aker¹⁵ &
Kalina Bontcheva¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10762))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

1175 Accesses
2 Citations

Abstract

Usage of online textual media is steadily increasing. Daily, more and more news stories, blog posts and scientific articles are added to the online volumes. These are all freely accessible and have been employed extensively in multiple research areas, e.g. automatic text summarization, information retrieval, information extraction, etc. Meanwhile, online debate forums have recently become popular, but have remained largely unexplored. For this reason, there are no sufficient resources of annotated debate data available for conducting research in this genre. In this paper, we collected and annotated debate data for an automatic summarization task. Similar to extractive gold standard summary generation our data contains sentences worthy to include into a summary. Five human annotators performed this task. Inter-annotator agreement, based on semantic similarity, is 36% for Cohen’s kappa and 48% for Krippendorff’s alpha. Moreover, we also implement an extractive summarization system for online debates and discuss prominent features for the task of summarizing online debate data automatically.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unsupervised Explainable Controversy Detection from Online News

Creating a reference data set for the summarization of discussion forum threads

Article Open access 21 April 2017

Detecting Agreement and Disagreement in Political Debates

Notes

1.
http://www.debate.org.
2.
This dataset can be downloaded at https://goo.gl/3aicDN.
3.
http://www.nltk.org/api/nltk.tag.html.

References

Morris, A.H., Kasper, G.M., Adams, D.A.: The effects and limitations of automated text condensing on reading comprehension performance. Inf. Syst. Res. 3(1), 17–35 (1992)
Article Google Scholar
Baxendale, P.B.: Machine-made index for technical literature: an experiment. IBM J. Res. Dev. 2(4), 354–361 (1958)
Article Google Scholar
Lin, C.-Y., Hovy, E.: The automated acquisition of topic signatures for text summarization. In: Proceedings of the 18th Conference on Computational Linguistics, COLING 2000, vol. 1, pp. 495–501. Association for Computational Linguistics, Stroudsburg (2000)
Google Scholar
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)
Google Scholar
Edmundson, H.P.: New methods in automatic extracting. J. ACM 16(2), 264–285 (1969)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Google Scholar
Liu, F., Liu Y.: Correlation between rouge and human evaluation of extractive meeting summaries. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, HLT-Short 2008, pp. 201–204. Association for Computational Linguistics, Stroudsburg (2008)
Google Scholar
Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarizing text documents: sentence selection and evaluation metrics. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, pp. 121–128. ACM, New York (1999)
Google Scholar
Landis, R.J., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)
Article Google Scholar
Januliene, A., Dziedraviius, J.: On the use of conjunctive adverbs in learners’ academic essays. Verbum 6, 69–83 (2015)
Article Google Scholar
Neto, J.L., Freitas, A.A., Kaestner, C.A.A.: Automatic text summarization using a machine learning approach. In: Bittencourt, G., Ramalho, G.L. (eds.) SBIA 2002. LNCS (LNAI), vol. 2507, pp. 205–215. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36127-8_20
Chapter Google Scholar
Krippendorff, K.: Content Analysis: An Introduction to its Methodology, 2nd edn. Sage Publications Inc., Thousand Oaks (2004)
Google Scholar
Mitrat, M., Singhal, A., Buckleytt, C.: Automatic text summarization by paragraph extraction. In: Intelligent Scalable Text Summarization, pp. 39–46 (1997)
Google Scholar
Manning, C.D., Schtze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Google Scholar
Nenkova, A., McKeown, K.: Automatic summarization. Found. Trends Inf. Retr. 5(2), 103–233 (2011)
Article Google Scholar
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta (2010). http://is.muni.cz/publication/884893/en

Download references

Acknowledgments

This work was partially supported by the UK EPSRC Grant No. EP/I004327/1, the European Union under Grant Agreements No. 611233 PHEME, and the authors would like to thank Bankok University of their support.

Author information

Authors and Affiliations

School of Information Technology and Innovation, Bangkok University, 9/1 Moo 5 Phaholyothin Road, Klong 1, Klong Luang, 12120, Pathumthani, Thailand
Nattapong Sanchan
Natural Language Processing Group, Department of Computer Science, The University of Sheffield, 211 Portobello, Sheffield, UK
Ahmet Aker & Kalina Bontcheva

Authors

Nattapong Sanchan
View author publications
You can also search for this author in PubMed Google Scholar
Ahmet Aker
View author publications
You can also search for this author in PubMed Google Scholar
Kalina Bontcheva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Nattapong Sanchan , Ahmet Aker or Kalina Bontcheva .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sanchan, N., Aker, A., Bontcheva, K. (2018). Gold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10762. Springer, Cham. https://doi.org/10.1007/978-3-319-77116-8_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-77116-8_37
Published: 10 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77115-1
Online ISBN: 978-3-319-77116-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Gold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Explainable Controversy Detection from Online News

Creating a reference data set for the summarization of discussion forum threads

Detecting Agreement and Disagreement in Political Debates

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Gold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Explainable Controversy Detection from Online News

Creating a reference data set for the summarization of discussion forum threads

Detecting Agreement and Disagreement in Political Debates

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation