research-article

CQASUMM: Building References for Community Question Answering Summarization Corpora

Authors:

Tanya Chowdhury,

Tanmoy ChakrabortyAuthors Info & Claims

CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

Pages 18 - 26

https://doi.org/10.1145/3297001.3297004

Published: 03 January 2019 Publication History

Abstract

Answers submitted to CQA forums are often elaborate, contain spam, are marred by slurs and business promotions. It is difficult for a reader to go through numerous such answers to gauge community opinion. As a result summarization becomes a prioritized task. However, there is a dearth of neural approaches for CQA summarization due to the lack of large scale annotated dataset. We create CQASUMM, the first annotated CQA summarization dataset by filtering the 4.4 million Yahoo! Answers L6 dataset. We sample threads where the best answer can double up as a reference and build hundred word summaries from them. We provide scripts1 to reconstruct the dataset and introduce the new task of Community Question Answering Summarization.

Multi document summarization(MDS) has been widely studied using news corpora. However documents in CQA have higher variance and contradicting opinion. We compare the popular MDS techniques and evaluate their performance on our CQA corpora. We find that most MDS workflows are built for the entirely factual news corpora, whereas our corpus has a fair share of opinion based instances too. We therefore introduce OpinioSumm, a new MDS which outperforms the best baseline by 4.6% w.r.t ROUGE-1 score.

References

[1]

Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In Lrec, Vol. 10. 2200--2204.

[2]

Darina Benikova, Margot Mieskes, Christian M Meyer, and Iryna Gurevych. 2016. Bridging the gap between extractive and abstractive summaries: Creation and evaluation of coherent extracts from heterogeneous sources. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 1039--1050.

[3]

Mohammed Salem Binwahlan, Naomie Salim, and Ladda Suanmali. 2009. Swarm based text summarization. In Computer Science and Information Technology-Spring Conference, 2009. IACSITSC'09. International Association of. IEEE, 145--150.

Digital Library

[4]

Aurélien Bossard and Christophe Rodrigues. 2011. Combining a multi-document update summarization system-CBSEAS-with a genetic algorithm. In Combinations of intelligent methods and applications. Springer, 71--87.

[5]

Samuel Broscheit, Massimo Poesio, Simone Paolo Ponzetto, Kepa Joseba Rodriguez, Lorenza Romano, Olga Uryupina, Yannick Versley, and Roberto Zanoli. 2010. BART: A multilingual anaphora resolution system. In Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics, 104--107.

Digital Library

[6]

Yunbo Cao and Chin-Yew Lin. 2011. Question type-sensitive answer summarization. (June 21 2011). US Patent 7,966,316.

[7]

Ziqiang Cao, Chengyao Chen, Wenjie Li, Sujian Li, Furu Wei, and Ming Zhou. 2016. TGSum: Build Tweet Guided Multi-Document Summarization Dataset. In AAAI. 2906--2912.

Digital Library

[8]

Tanya Chowdhury, Aashay Mittal, and Tanmoy Chakraborty. 2018. VIZ-Wiki: Generating Visual Summaries to Factoid Threads in Community Question Answering Services. In Companion of the The Web Conference 2018 on The Web Conference 2018. International World Wide Web Conferences Steering Committee, 231--234.

Digital Library

[9]

Günes Erkan and Dragomir R Radev. 2004. Lexpagerank: Prestige in multi-document text summarization. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 1--7.

[10]

Günes Erkan and Dragomir R Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research 22 (2004), 457--479.

[11]

Aria Haghighi and Lucy Vanderwende. 2009. Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 362--370.

Digital Library

[12]

Shanmugasundaram Hariharan and Rengaramanujam Srinivasan. 2009. Studies on graph based approaches for single and multi document summarizations. International Journal of Computer Theory and Engineering 1, 5 (2009), 519--526.

[13]

Marti A Hearst. 1997. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics 23, 1 (1997), 33--64.

Digital Library

[14]

Yogan Jaya Kumar and Naomie Salim. 2011. Automatic Multi Document Summarization Approaches. Journal of Computer Science 8 (2011), 133--144.

[15]

Yuanjie Liu, Shasha Li, Yunbo Cao, Chin-Yew Lin, Dingyi Han, and Yong Yu. 2008. Understanding and summarizing answers in community-based question answering services. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 497--504.

Digital Library

[16]

Masahiro Nakano, Hideyuki Shibuki, Rintaro Miyazaki, Madoka Ishioroshi, Koichi Kaneko, and Tatsunori Mori. 2010. Construction of Text Summarization Corpus for the Credibility of Information on the Web. In LREC. Citeseer.

[17]

Ani Nenkova and Lucy Vanderwende. 2005. The impact of frequency on summarization. Microsoft Research, Redmond, Washington, Tech. Rep. MSR-TR-2005 101 (2005).

[18]

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.

[19]

Vinay Pande, Tanmoy Mukherjee, and Vasudeva Varma. 2013. Summarizing answers for community question answer services. In Language Processing and Knowledge in the Web. Springer, 151--161.

[20]

Hongya Song, Zhaochun Ren, Shangsong Liang, Piji Li, Jun Ma, and Maarten de Rijke. 2017. Summarizing answers in non-factoid community question-answering. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 405--414.

Digital Library

[21]

Josef Steinberger, Massimo Poesio, Mijail A Kabadjov, and Karel Ježek. 2007. Two uses of anaphora resolution in summarization. Information Processing & Management 43, 6 (2007), 1663--1680.

Digital Library

[22]

Hiroya Takamura and Manabu Okumura. 2010. Learning to generate summary as structured output. In Proceedings of the 19th ACM international Conference on Information and Knowledge Management. ACM, 1437--1440.

Digital Library

[23]

Lucy Vanderwende, Hisami Suzuki, Chris Brockett, and Ani Nenkova. 2007. Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management 43, 6 (2007), 1606--1618.

Digital Library

[24]

Svitlana Volkova, Kyle Shaffer, Jin Yea Jang, and Nathan Hodas. 2017. Separating facts from fiction: Linguistic models to classify suspicious and trusted news posts on twitter. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. 2. 647--653.

[25]

Xiaojun Wan and Jianwu Yang. 2006. Improved affinity graph based multi-document summarization. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers. Association for Computational Linguistics, 181--184.

Digital Library

[26]

Yunqing Xia, Yonggang Zhang, and Jianmin Yao. 2011. Co-clustering sentences and terms for multi-document summarization. In International Conference on Intelligent Text Processing and Computational Linguistics. Springer, 339--352.

Digital Library

[27]

Vineet Yadav, Harsha Elchuri, and others. 2013. Serendio: Simple and Practical lexicon based approach to Sentiment Analysis. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Vol. 2. 543--548.

[28]

Hong Yu and Vasileios Hatzivassiloglou. 2003. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the 2003 conference on Empirical methods in natural language processing. Association for Computational Linguistics, 129--136.

Digital Library

[29]

Markus Zopf, Maxime Peyrard, and Judith Eckle-Kohler. 2016. The Next Step for Multi-Document Summarization: A Heterogeneous Multi-Genre Corpus Built with a Novel Construction Approach. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 1535--1545.

Cited By

Mei MHu YDeng YZhang XLi YYou H(2024)Summarizing Community-Based Question-Answer Pairs with Focus RectificationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446905(11391-11395)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10446905
Jia QLiu YRen SZhu K(2023)Taxonomy of Abstractive Dialogue Summarization: Scenarios, Approaches, and Future DirectionsACM Computing Surveys10.1145/362293356:3(1-38)Online publication date: 5-Oct-2023
https://dl.acm.org/doi/10.1145/3622933
Atri YGoyal VChakraborty T(2023)Multi-Document Summarization Using Selective Attention Span and Reinforcement LearningIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2023.331645931(3457-3467)Online publication date: 2023
https://doi.org/10.1109/TASLP.2023.3316459
Show More Cited By

Index Terms

CQASUMM: Building References for Community Question Answering Summarization Corpora
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval tasks and goals
      1. Information extraction

Recommendations

Where to Post: Routing Questions to Right Community in Community Question Answering Systems
CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

At present, question-answer (QA) sites have become one of the most important sources of information sharing. In order to ease search and categorization, QA sites create communities to discuss a specific topic or interest. As a consequence, a large ...
Learning semantic representation with neural networks for community question answering retrieval

Learning the semantic representation using neural network architecture.The neural network is trained via pre-training and fine-tuning phase.The learned semantic level feature is incorporated into a LTR framework. In community question answering (cQA), ...
Mining wikipedia and yahoo! answers for question expansion in opinion QA
PAKDD'10: Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I

Opinion Question Answering (Opinion QA) is still a relatively new area in QA research. The achieved methods focus on combining sentiment analysis with the traditional Question Answering methods. Few attempts have been made to expand opinion questions ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

January 2019

380 pages

ISBN:9781450362078

DOI:10.1145/3297001

General Chairs:
Lipika Dey
TCS Innovation Labs
,
Surajit Chaudhury
Microsoft Research
,
Program Chairs:
Raghu Krishnapuram
Robert Bosch Center, IISc Bangalore
,
Parag Singla
IIT Delhi
,
Publications Chair:
Rishiraj Saha Roy
Max Planck Institute for Informatics

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 January 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CoDS-COMAD '19

CoDS-COMAD '19: 6th ACM IKDD CoDS and 24th COMAD

January 3 - 5, 2019

Kolkata, India

Acceptance Rates

CODS-COMAD '19 Paper Acceptance Rate 62 of 198 submissions, 31%;

Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
147
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mei MHu YDeng YZhang XLi YYou H(2024)Summarizing Community-Based Question-Answer Pairs with Focus RectificationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446905(11391-11395)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10446905
Jia QLiu YRen SZhu K(2023)Taxonomy of Abstractive Dialogue Summarization: Scenarios, Approaches, and Future DirectionsACM Computing Surveys10.1145/362293356:3(1-38)Online publication date: 5-Oct-2023
https://dl.acm.org/doi/10.1145/3622933
Atri YGoyal VChakraborty T(2023)Multi-Document Summarization Using Selective Attention Span and Reinforcement LearningIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2023.331645931(3457-3467)Online publication date: 2023
https://doi.org/10.1109/TASLP.2023.3316459
Alomari AIdris NSabri AAlsmadi I(2022)Deep reinforcement and transfer learning for abstractive text summarizationComputer Speech and Language10.1016/j.csl.2021.10127671:COnline publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1016/j.csl.2021.101276
Lata KSingh PDutta K(2021)A comprehensive review on feature set used for anaphora resolutionArtificial Intelligence Review10.1007/s10462-020-09917-354:4(2917-3006)Online publication date: 1-Apr-2021
https://dl.acm.org/doi/10.1007/s10462-020-09917-3
Thukral DPandey AGupta RGoyal VChakraborty T(2019)DiffQueACM Transactions on Intelligent Systems and Technology10.1145/333779910:4(1-27)Online publication date: 24-Jul-2019
https://dl.acm.org/doi/10.1145/3337799

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents