Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2661829.2661908acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Question Retrieval with High Quality Answers in Community Question Answering

Published: 03 November 2014 Publication History

Abstract

This paper studies the problem of question retrieval in community question answering (CQA). To bridge lexical gaps in questions, which is regarded as the biggest challenge in retrieval, state-of-the-art methods learn translation models using answers under an assumption that they are parallel texts. In practice, however, questions and answers are far from "parallel". Indeed, they are heterogeneous for both the literal level and user behaviors. There are a particularly large number of low quality answers, to which the performance of translation models is vulnerable. To address these problems, we propose a supervised question-answer topic modeling approach. The approach assumes that questions and answers share some common latent topics and are generated in a "question language" and "answer language" respectively following the topics. The topics also determine an answer quality signal. Compared with translation models, our approach not only comprehensively models user behaviors on CQA portals, but also highlights the instinctive heterogeneity of questions and answers. More importantly, it takes answer quality into account and performs robustly against noise in answers. With the topic modeling approach, we propose a topic-based language model, which matches questions not only on a term level but also on a topic level. We conducted experiments on large scale data from Yahoo! Answers and Baidu Knows. Experimental results show that the proposed model can significantly outperform state-of-the-art retrieval models in CQA.

References

[1]
E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In WSDM'08, pages 183--194, 2008.
[2]
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.
[3]
A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal. Bridging the lexical chasm: statistical approaches to answer-finding. In SIGIR'00, pages 192--199, 2000.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR'03, 3:993--1022, 2003.
[5]
L. Cai, G. Zhou, K. Liu, and J. Zhao. Learning the latent topics for question retrieval in community qa. In IJCNLP'11, pages 273--281, 2011.
[6]
X. Cao, G. Cong, B. Cui, and C. S. Jensen. A generalized framework of exploring category information for question retrieval in community question answer archives. In WWW'10, pages 201--210, 2010.
[7]
X. Cao, G. Cong, B. Cui, C. S. Jensen, and C. Zhang. The use of categorization information in language models for question retrieval. In CIKM'09, pages 265--274, 2009.
[8]
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR'99, pages 50--57, 1999.
[9]
J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In CIKM'05, pages 84--90, 2005.
[10]
J. Jeon, W. B. Croft, J. H. Lee, and S. Park. A framework to predict the quality of answers with non-textual features. In SIGIR'06, pages 228--235, 2006.
[11]
Z. Ji, F. Xu, B. Wang, and B. He. Question-answer topic model for question retrieval in community question answering. In CIKM'12, pages 2471--2474, 2012.
[12]
J. D. Mcauliffe and D. M. Blei. Supervised topic models. In NIPS'07, pages 121--128, 2007.
[13]
P. McCullagh. Generalized linear models. European Journal of Operational Research, 16(3):285--292, 1984.
[14]
N. G. Polson and S. L. Scott. Data augmentation for support vector machines. Bayesion Analysis, 6(1):1--24, 2011.
[15]
S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In TREC, 1994.
[16]
T. Sakai, D. Ishikawa, N. Kando, Y. Seki, K. Kuriyama, and C.-Y. Lin. Using graded-relevance metrics for evaluating community qa answer selection. In WSDM'11, pages 187--196, 2011.
[17]
C. Shah and J. Pomerantz. Evaluating and predicting answer quality in community qa. In SIGIR'10, pages 411--418, 2010.
[18]
E. M. Voorhees. The trec-8 question answering track report. In Proceedings of the 8th Text Retrieval Conference, pages 77--82, 1999.
[19]
X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In SIGIR'06, pages 178--185, 2006.
[20]
X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In SIGIR'08, pages 475--482, 2008.
[21]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179--214, 2004.
[22]
B. Zhao and E. P. Xing. Bitam: Bilingual topic admixture models for word alignment. In Proceedings of the COLING/ACL on Main conference poster sessions, pages 969--976, 2006.
[23]
J. Zhu, A. Ahmed, and E. P. Xing. Medlda: maximum margin supervised topic models for regression and classification. In ICML'09, pages 1257--1264, 2009.
[24]
J. Zhu, N. Chen, H. Perkins, and B. Zhang. Gibbs max-margin topic models with fast sampling algorithms. In ICML'13, pages 124--132, 2013.

Cited By

View all

Index Terms

  1. Question Retrieval with High Quality Answers in Community Question Answering

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
    November 2014
    2152 pages
    ISBN:9781450325981
    DOI:10.1145/2661829
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 November 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. answer quality
    2. comunity question answering
    3. question retrieval
    4. supervised topic model

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CIKM '14
    Sponsor:

    Acceptance Rates

    CIKM '14 Paper Acceptance Rate 175 of 838 submissions, 21%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)58
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Reusing Keywords for Fine-grained Representations and MatchingsDatabase Systems for Advanced Applications10.1007/978-981-97-5779-4_6(83-98)Online publication date: 11-Jan-2025
    • (2023)Boosting Domain-Specific Question Answering Through Weakly Supervised Self-Training2023 IEEE International Conference on Intelligence and Security Informatics (ISI)10.1109/ISI58743.2023.10297258(1-6)Online publication date: 2-Oct-2023
    • (2023) Ask and Ye shall be AnsweredInformation Fusion10.1016/j.inffus.2023.10185699:COnline publication date: 1-Nov-2023
    • (2023)Research on question retrieval method for community question answeringMultimedia Tools and Applications10.1007/s11042-023-14458-282:16(24309-24325)Online publication date: 17-Feb-2023
    • (2023)Similar question retrieval with incorporation of multi-dimensional quality analysis for community question answeringNeural Computing and Applications10.1007/s00521-023-09266-636:7(3663-3679)Online publication date: 6-Dec-2023
    • (2022)Adversarial Cross-domain Community Question RetrievalACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348729121:3(1-22)Online publication date: 10-Jan-2022
    • (2021)DeepDup: Duplicate Question Detection in Community Question AnsweringProceedings of the 2021 5th International Conference on Deep Learning Technologies10.1145/3480001.3480021(8-12)Online publication date: 23-Jul-2021
    • (2021)Quiz-Style Question Generation for News StoriesProceedings of the Web Conference 202110.1145/3442381.3449892(2501-2511)Online publication date: 19-Apr-2021
    • (2021)Answer Keyword Generation for Community Question Answering by Multiaspect Gamma–Poisson Matrix CompletionIEEE Intelligent Systems10.1109/MIS.2020.299771436:4(35-47)Online publication date: 1-Jul-2021
    • (2020)Augmentation of Local Government FAQs using Community-based Question-answering DataProceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services10.1145/3428757.3429137(362-366)Online publication date: 30-Nov-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media