research-article

Question Retrieval with High Quality Answers in Community Question Answering

Authors:

Ming ZhouAuthors Info & Claims

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Pages 371 - 380

https://doi.org/10.1145/2661829.2661908

Published: 03 November 2014 Publication History

Abstract

This paper studies the problem of question retrieval in community question answering (CQA). To bridge lexical gaps in questions, which is regarded as the biggest challenge in retrieval, state-of-the-art methods learn translation models using answers under an assumption that they are parallel texts. In practice, however, questions and answers are far from "parallel". Indeed, they are heterogeneous for both the literal level and user behaviors. There are a particularly large number of low quality answers, to which the performance of translation models is vulnerable. To address these problems, we propose a supervised question-answer topic modeling approach. The approach assumes that questions and answers share some common latent topics and are generated in a "question language" and "answer language" respectively following the topics. The topics also determine an answer quality signal. Compared with translation models, our approach not only comprehensively models user behaviors on CQA portals, but also highlights the instinctive heterogeneity of questions and answers. More importantly, it takes answer quality into account and performs robustly against noise in answers. With the topic modeling approach, we propose a topic-based language model, which matches questions not only on a term level but also on a topic level. We conducted experiments on large scale data from Yahoo! Answers and Baidu Knows. Experimental results show that the proposed model can significantly outperform state-of-the-art retrieval models in CQA.

References

[1]

E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In WSDM'08, pages 183--194, 2008.

Digital Library

[2]

R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.

Digital Library

[3]

A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal. Bridging the lexical chasm: statistical approaches to answer-finding. In SIGIR'00, pages 192--199, 2000.

Digital Library

[4]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR'03, 3:993--1022, 2003.

Digital Library

[5]

L. Cai, G. Zhou, K. Liu, and J. Zhao. Learning the latent topics for question retrieval in community qa. In IJCNLP'11, pages 273--281, 2011.

[6]

X. Cao, G. Cong, B. Cui, and C. S. Jensen. A generalized framework of exploring category information for question retrieval in community question answer archives. In WWW'10, pages 201--210, 2010.

Digital Library

[7]

X. Cao, G. Cong, B. Cui, C. S. Jensen, and C. Zhang. The use of categorization information in language models for question retrieval. In CIKM'09, pages 265--274, 2009.

Digital Library

[8]

T. Hofmann. Probabilistic latent semantic indexing. In SIGIR'99, pages 50--57, 1999.

Digital Library

[9]

J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In CIKM'05, pages 84--90, 2005.

Digital Library

[10]

J. Jeon, W. B. Croft, J. H. Lee, and S. Park. A framework to predict the quality of answers with non-textual features. In SIGIR'06, pages 228--235, 2006.

Digital Library

[11]

Z. Ji, F. Xu, B. Wang, and B. He. Question-answer topic model for question retrieval in community question answering. In CIKM'12, pages 2471--2474, 2012.

Digital Library

[12]

J. D. Mcauliffe and D. M. Blei. Supervised topic models. In NIPS'07, pages 121--128, 2007.

[13]

P. McCullagh. Generalized linear models. European Journal of Operational Research, 16(3):285--292, 1984.

[14]

N. G. Polson and S. L. Scott. Data augmentation for support vector machines. Bayesion Analysis, 6(1):1--24, 2011.

[15]

S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In TREC, 1994.

[16]

T. Sakai, D. Ishikawa, N. Kando, Y. Seki, K. Kuriyama, and C.-Y. Lin. Using graded-relevance metrics for evaluating community qa answer selection. In WSDM'11, pages 187--196, 2011.

Digital Library

[17]

C. Shah and J. Pomerantz. Evaluating and predicting answer quality in community qa. In SIGIR'10, pages 411--418, 2010.

Digital Library

[18]

E. M. Voorhees. The trec-8 question answering track report. In Proceedings of the 8th Text Retrieval Conference, pages 77--82, 1999.

[19]

X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In SIGIR'06, pages 178--185, 2006.

Digital Library

[20]

X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In SIGIR'08, pages 475--482, 2008.

Digital Library

[21]

C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179--214, 2004.

Digital Library

[22]

B. Zhao and E. P. Xing. Bitam: Bilingual topic admixture models for word alignment. In Proceedings of the COLING/ACL on Main conference poster sessions, pages 969--976, 2006.

Digital Library

[23]

J. Zhu, A. Ahmed, and E. P. Xing. Medlda: maximum margin supervised topic models for regression and classification. In ICML'09, pages 1257--1264, 2009.

Digital Library

[24]

J. Zhu, N. Chen, H. Perkins, and B. Zhang. Gibbs max-margin topic models with fast sampling algorithms. In ICML'13, pages 124--132, 2013.

Cited By

Chong LMa DChen YLv X(2025)Reusing Keywords for Fine-grained Representations and MatchingsDatabase Systems for Advanced Applications10.1007/978-981-97-5779-4_6(83-98)Online publication date: 11-Jan-2025
https://doi.org/10.1007/978-981-97-5779-4_6
Wang MCao JKong QLuo Y(2023)Boosting Domain-Specific Question Answering Through Weakly Supervised Self-Training2023 IEEE International Conference on Intelligence and Security Informatics (ISI)10.1109/ISI58743.2023.10297258(1-6)Online publication date: 2-Oct-2023
https://doi.org/10.1109/ISI58743.2023.10297258
Costa GOrtale R(2023) Ask and Ye shall be AnsweredInformation Fusion10.1016/j.inffus.2023.10185699:COnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.inffus.2023.101856
Show More Cited By

Index Terms

Question Retrieval with High Quality Answers in Community Question Answering
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Question-answer topic model for question retrieval in community question answering
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

The major challenge for Question Retrieval (QR) in Community Question Answering (CQA) is the lexical gap between the queried question and the historical questions. This paper proposes a novel Question-Answer Topic Model (QATM) to learn the latent topics ...
Quality-aware collaborative question answering: methods and evaluation
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

Community Question Answering (QA) portals contain questions and answers contributed by hundreds of millions of users. These databases of questions and answers are of great value if they can be used directly to answer questions from any user. In this ...
Summarizing Answers in Non-Factoid Community Question-Answering
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

We aim at summarizing answers in community question-answering (CQA). While most previous work focuses on factoid question-answering, we focus on the non-factoid question-answering. Unlike factoid CQA, non-factoid question-answering usually requires ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

November 2014

2152 pages

ISBN:9781450325981

DOI:10.1145/2661829

General Chairs:
Jianzhong Li
Harbin Inst. of Technology
,
X. Sean Wang
Fudan University
,
Program Chairs:
Minos Garofalakis
Technical University of Crete, Greece
,
Ian Soboroff
National Institute of Standards, USA
,
Torsten Suel
New York University, USA
,
Min Wang
Google Research, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Fund of the State Key Laboratory of Software Development Environment
Ministry of Education of the People's Republic of China
Microsoft Research
National Natural Science Foundation of China

Conference

CIKM '14

Sponsor:

CIKM '14: 2014 ACM Conference on Information and Knowledge Management

November 3 - 7, 2014

Shanghai, China

Acceptance Rates

CIKM '14 Paper Acceptance Rate 175 of 838 submissions, 21%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

56
Total Citations
View Citations
1,303
Total Downloads

Downloads (Last 12 months)58
Downloads (Last 6 weeks)6

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chong LMa DChen YLv X(2025)Reusing Keywords for Fine-grained Representations and MatchingsDatabase Systems for Advanced Applications10.1007/978-981-97-5779-4_6(83-98)Online publication date: 11-Jan-2025
https://doi.org/10.1007/978-981-97-5779-4_6
Wang MCao JKong QLuo Y(2023)Boosting Domain-Specific Question Answering Through Weakly Supervised Self-Training2023 IEEE International Conference on Intelligence and Security Informatics (ISI)10.1109/ISI58743.2023.10297258(1-6)Online publication date: 2-Oct-2023
https://doi.org/10.1109/ISI58743.2023.10297258
Costa GOrtale R(2023) Ask and Ye shall be AnsweredInformation Fusion10.1016/j.inffus.2023.10185699:COnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.inffus.2023.101856
Sun YSong JSong XHou J(2023)Research on question retrieval method for community question answeringMultimedia Tools and Applications10.1007/s11042-023-14458-282:16(24309-24325)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1007/s11042-023-14458-2
Liu YTang WLiu ZTang AZhang L(2023)Similar question retrieval with incorporation of multi-dimensional quality analysis for community question answeringNeural Computing and Applications10.1007/s00521-023-09266-636:7(3663-3679)Online publication date: 6-Dec-2023
https://dl.acm.org/doi/10.1007/s00521-023-09266-6
Guo ALi XPang NZhao X(2022)Adversarial Cross-domain Community Question RetrievalACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348729121:3(1-22)Online publication date: 10-Jan-2022
https://dl.acm.org/doi/10.1145/3487291
Mohomed Jabbar MKumar LWaqar Samuel HKim MPrabharkar SGoebel RZaiane O(2021)DeepDup: Duplicate Question Detection in Community Question AnsweringProceedings of the 2021 5th International Conference on Deep Learning Technologies10.1145/3480001.3480021(8-12)Online publication date: 23-Jul-2021
https://dl.acm.org/doi/10.1145/3480001.3480021
Lelkes ATran VYu C(2021)Quiz-Style Question Generation for News StoriesProceedings of the Web Conference 202110.1145/3442381.3449892(2501-2511)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449892
Liu QDo TCao L(2021)Answer Keyword Generation for Community Question Answering by Multiaspect Gamma–Poisson Matrix CompletionIEEE Intelligent Systems10.1109/MIS.2020.299771436:4(35-47)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.1109/MIS.2020.2997714
Seki YOguni MFujita S(2020)Augmentation of Local Government FAQs using Community-based Question-answering DataProceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services10.1145/3428757.3429137(362-366)Online publication date: 30-Nov-2020
https://dl.acm.org/doi/10.1145/3428757.3429137
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents