Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3297001.3297004acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
research-article

CQASUMM: Building References for Community Question Answering Summarization Corpora

Published: 03 January 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Answers submitted to CQA forums are often elaborate, contain spam, are marred by slurs and business promotions. It is difficult for a reader to go through numerous such answers to gauge community opinion. As a result summarization becomes a prioritized task. However, there is a dearth of neural approaches for CQA summarization due to the lack of large scale annotated dataset. We create CQASUMM, the first annotated CQA summarization dataset by filtering the 4.4 million Yahoo! Answers L6 dataset. We sample threads where the best answer can double up as a reference and build hundred word summaries from them. We provide scripts1 to reconstruct the dataset and introduce the new task of Community Question Answering Summarization.
    Multi document summarization(MDS) has been widely studied using news corpora. However documents in CQA have higher variance and contradicting opinion. We compare the popular MDS techniques and evaluate their performance on our CQA corpora. We find that most MDS workflows are built for the entirely factual news corpora, whereas our corpus has a fair share of opinion based instances too. We therefore introduce OpinioSumm, a new MDS which outperforms the best baseline by 4.6% w.r.t ROUGE-1 score.

    References

    [1]
    Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In Lrec, Vol. 10. 2200--2204.
    [2]
    Darina Benikova, Margot Mieskes, Christian M Meyer, and Iryna Gurevych. 2016. Bridging the gap between extractive and abstractive summaries: Creation and evaluation of coherent extracts from heterogeneous sources. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 1039--1050.
    [3]
    Mohammed Salem Binwahlan, Naomie Salim, and Ladda Suanmali. 2009. Swarm based text summarization. In Computer Science and Information Technology-Spring Conference, 2009. IACSITSC'09. International Association of. IEEE, 145--150.
    [4]
    Aurélien Bossard and Christophe Rodrigues. 2011. Combining a multi-document update summarization system-CBSEAS-with a genetic algorithm. In Combinations of intelligent methods and applications. Springer, 71--87.
    [5]
    Samuel Broscheit, Massimo Poesio, Simone Paolo Ponzetto, Kepa Joseba Rodriguez, Lorenza Romano, Olga Uryupina, Yannick Versley, and Roberto Zanoli. 2010. BART: A multilingual anaphora resolution system. In Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics, 104--107.
    [6]
    Yunbo Cao and Chin-Yew Lin. 2011. Question type-sensitive answer summarization. (June 21 2011). US Patent 7,966,316.
    [7]
    Ziqiang Cao, Chengyao Chen, Wenjie Li, Sujian Li, Furu Wei, and Ming Zhou. 2016. TGSum: Build Tweet Guided Multi-Document Summarization Dataset. In AAAI. 2906--2912.
    [8]
    Tanya Chowdhury, Aashay Mittal, and Tanmoy Chakraborty. 2018. VIZ-Wiki: Generating Visual Summaries to Factoid Threads in Community Question Answering Services. In Companion of the The Web Conference 2018 on The Web Conference 2018. International World Wide Web Conferences Steering Committee, 231--234.
    [9]
    Günes Erkan and Dragomir R Radev. 2004. Lexpagerank: Prestige in multi-document text summarization. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 1--7.
    [10]
    Günes Erkan and Dragomir R Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research 22 (2004), 457--479.
    [11]
    Aria Haghighi and Lucy Vanderwende. 2009. Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 362--370.
    [12]
    Shanmugasundaram Hariharan and Rengaramanujam Srinivasan. 2009. Studies on graph based approaches for single and multi document summarizations. International Journal of Computer Theory and Engineering 1, 5 (2009), 519--526.
    [13]
    Marti A Hearst. 1997. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics 23, 1 (1997), 33--64.
    [14]
    Yogan Jaya Kumar and Naomie Salim. 2011. Automatic Multi Document Summarization Approaches. Journal of Computer Science 8 (2011), 133--144.
    [15]
    Yuanjie Liu, Shasha Li, Yunbo Cao, Chin-Yew Lin, Dingyi Han, and Yong Yu. 2008. Understanding and summarizing answers in community-based question answering services. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 497--504.
    [16]
    Masahiro Nakano, Hideyuki Shibuki, Rintaro Miyazaki, Madoka Ishioroshi, Koichi Kaneko, and Tatsunori Mori. 2010. Construction of Text Summarization Corpus for the Credibility of Information on the Web. In LREC. Citeseer.
    [17]
    Ani Nenkova and Lucy Vanderwende. 2005. The impact of frequency on summarization. Microsoft Research, Redmond, Washington, Tech. Rep. MSR-TR-2005 101 (2005).
    [18]
    Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.
    [19]
    Vinay Pande, Tanmoy Mukherjee, and Vasudeva Varma. 2013. Summarizing answers for community question answer services. In Language Processing and Knowledge in the Web. Springer, 151--161.
    [20]
    Hongya Song, Zhaochun Ren, Shangsong Liang, Piji Li, Jun Ma, and Maarten de Rijke. 2017. Summarizing answers in non-factoid community question-answering. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 405--414.
    [21]
    Josef Steinberger, Massimo Poesio, Mijail A Kabadjov, and Karel Ježek. 2007. Two uses of anaphora resolution in summarization. Information Processing & Management 43, 6 (2007), 1663--1680.
    [22]
    Hiroya Takamura and Manabu Okumura. 2010. Learning to generate summary as structured output. In Proceedings of the 19th ACM international Conference on Information and Knowledge Management. ACM, 1437--1440.
    [23]
    Lucy Vanderwende, Hisami Suzuki, Chris Brockett, and Ani Nenkova. 2007. Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management 43, 6 (2007), 1606--1618.
    [24]
    Svitlana Volkova, Kyle Shaffer, Jin Yea Jang, and Nathan Hodas. 2017. Separating facts from fiction: Linguistic models to classify suspicious and trusted news posts on twitter. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. 2. 647--653.
    [25]
    Xiaojun Wan and Jianwu Yang. 2006. Improved affinity graph based multi-document summarization. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers. Association for Computational Linguistics, 181--184.
    [26]
    Yunqing Xia, Yonggang Zhang, and Jianmin Yao. 2011. Co-clustering sentences and terms for multi-document summarization. In International Conference on Intelligent Text Processing and Computational Linguistics. Springer, 339--352.
    [27]
    Vineet Yadav, Harsha Elchuri, and others. 2013. Serendio: Simple and Practical lexicon based approach to Sentiment Analysis. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Vol. 2. 543--548.
    [28]
    Hong Yu and Vasileios Hatzivassiloglou. 2003. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the 2003 conference on Empirical methods in natural language processing. Association for Computational Linguistics, 129--136.
    [29]
    Markus Zopf, Maxime Peyrard, and Judith Eckle-Kohler. 2016. The Next Step for Multi-Document Summarization: A Heterogeneous Multi-Genre Corpus Built with a Novel Construction Approach. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 1535--1545.

    Cited By

    View all
    • (2024)Summarizing Community-Based Question-Answer Pairs with Focus RectificationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446905(11391-11395)Online publication date: 14-Apr-2024
    • (2023)Taxonomy of Abstractive Dialogue Summarization: Scenarios, Approaches, and Future DirectionsACM Computing Surveys10.1145/362293356:3(1-38)Online publication date: 5-Oct-2023
    • (2023)Multi-Document Summarization Using Selective Attention Span and Reinforcement LearningIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2023.331645931(3457-3467)Online publication date: 2023
    • Show More Cited By

    Index Terms

    1. CQASUMM: Building References for Community Question Answering Summarization Corpora

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data
        January 2019
        380 pages
        ISBN:9781450362078
        DOI:10.1145/3297001
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 03 January 2019

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Community Question Answering
        2. Multi Document Summarization
        3. Summarization Corpus
        4. Yahoo! Answers

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        CoDS-COMAD '19
        CoDS-COMAD '19: 6th ACM IKDD CoDS and 24th COMAD
        January 3 - 5, 2019
        Kolkata, India

        Acceptance Rates

        CODS-COMAD '19 Paper Acceptance Rate 62 of 198 submissions, 31%;
        Overall Acceptance Rate 197 of 680 submissions, 29%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)7
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 27 Jul 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Summarizing Community-Based Question-Answer Pairs with Focus RectificationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446905(11391-11395)Online publication date: 14-Apr-2024
        • (2023)Taxonomy of Abstractive Dialogue Summarization: Scenarios, Approaches, and Future DirectionsACM Computing Surveys10.1145/362293356:3(1-38)Online publication date: 5-Oct-2023
        • (2023)Multi-Document Summarization Using Selective Attention Span and Reinforcement LearningIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2023.331645931(3457-3467)Online publication date: 2023
        • (2022)Deep reinforcement and transfer learning for abstractive text summarizationComputer Speech and Language10.1016/j.csl.2021.10127671:COnline publication date: 1-Jan-2022
        • (2021)A comprehensive review on feature set used for anaphora resolutionArtificial Intelligence Review10.1007/s10462-020-09917-354:4(2917-3006)Online publication date: 1-Apr-2021
        • (2019)DiffQueACM Transactions on Intelligent Systems and Technology10.1145/333779910:4(1-27)Online publication date: 24-Jul-2019

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media