Article

A framework to predict the quality of answers with non-textual features

Authors:

W. Bruce Croft,

Soyeon ParkAuthors Info & Claims

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 228 - 235

https://doi.org/10.1145/1148170.1148212

Published: 06 August 2006 Publication History

Abstract

New types of document collections are being developed by various web services. The service providers keep track of non-textual features such as click counts. In this paper, we present a framework to use non-textual features to predict the quality of documents. We also show our quality measure can be successfully incorporated into the language modeling-based retrieval model. We test our approach on a collection of question and answer pairs gathered from a community based question answering service where people ask and answer questions. Experimental results using our quality measure show a significant improvement over our baseline.

References

[1]

A. Berger, S. D. Pietra, and V. D. Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--71, 1996.

Digital Library

[2]

S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998.

Digital Library

[3]

R. D. Burke, K. J. Hammond, V. A. Kulyukin, S. L. Lytinen, N. Tomuro, and S. Schoenberg. Question answering from frequently asked question files: Experiences with the faq finder system. AI Magazine, 18(2):57--66, 1997.

Digital Library

[4]

D. Harman. Overview of the first text retrieval conference (trec-1). In Proceedings of the First TREC Conference, pages 1--20, 1992.

[5]

J. Hwang, S. Lay, and A. Lippman. Nonparametric multivariate density estimation: A comparative study. IEEE Transactions of Signal Processing, 42(10):2795--2810, 1994.

Digital Library

[6]

J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In Proceedings of the ACM Fourteenth Conference on Information and Knowledge Management, pages 76--83, 2005.

Digital Library

[7]

J. Jeon and R. Manmatha. Using maximum entropy for automatic image annotation. Image and Video Retrieval Third International Conference, CIVR 2004, Proceedings Series: Lecture Notes in Computer Science, 3115:24--32, 2004.

[8]

V. Jijkoun and M. de Rijke. Retrieving answers from frequently asked questions pages on the web. In Proceedings of the ACM Fourteenth Conference on Information and Knowledge Management, pages 76--83, 2005.

Digital Library

[9]

H. Kim and J. Seo. High-performance faq retrieval using an automatic clustering method of query logs. Information Processing and Management, 42(3):650--661, 2006.

Digital Library

[10]

J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.

Digital Library

[11]

W. Kraaij, T. Westerveld, and D. Hiemstra. The importance of prior probabilities for entry page search. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 27--34, 2002.

Digital Library

[12]

L. S. Larkey. Automatic essay grading using text categorization techniques. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 90--95, 1998.

Digital Library

[13]

M. Lenz, A. Hubner, and M. Kunze. Question answering with textual cbr. In Proceedings of the Third International Conference on Flexible Query Answering Systems, pages 236--247, 1998.

Digital Library

[14]

X. Li and W. B. Croft. Time-based language models. In Proceedings of the Twelfth ACM International Conference on Information and knowledge management, pages 469--475, 2003.

Digital Library

[15]

R. Malouf. A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of Conference on Computational Natural Language Learning, pages 49--55, 2002.

Digital Library

[16]

K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In Proceedings of IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61--67, 1999.

[17]

B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002.

Digital Library

[18]

J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275--281, 1998.

Digital Library

[19]

E. Sneiders. Automated faq answering: Continued experience with shallow language understanding. In Proceedings for the 1999 AAAI Fall Symposium on Question Answering Systems, 1999.

[20]

D. M. Strong, Y. W. Lee, and R. Y. Wang. Data quality in context. Communications of the ACM, 40(5):103--110, 1997.

Digital Library

[21]

C.-H. Wu, J.-F. Yeh, and M.-J. Chen. Domain-specific faq retrieval using independent aspects. ACM Transactions on Asian Language Information Processing, 4(1):1--17, 2005.

Digital Library

[22]

C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 334--342, 2001.

Digital Library

[23]

Y. Zhou and W. B. Croft. Document quality models for web ad hoc retrieval. In Proceedings of the ACM Fourteenth Conference on Information and Knowledge Management, pages 331--332, 2005.

Digital Library

[24]

X. Zhu and S. Gauch. Incorporating quality metrics in centralized/distributed information retrieval on the world wide web. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 288--295, 2000.

Digital Library

Cited By

Sha AShi YHaller A(2024)Deciphering digital popularity from question quality: the preeminence of social attributes over textual dimensions in driving CQA platform trafficJournal of Information and Telecommunication10.1080/24751839.2024.2400439(1-24)Online publication date: 11-Sep-2024
https://doi.org/10.1080/24751839.2024.2400439
Shi YHaller AReeson ALi XLi C(2024)Investigating the effects of nudges to promote knowledge-sharing behaviours on MOOC forums: a mixed method designBehaviour & Information Technology10.1080/0144929X.2024.2316287(1-26)Online publication date: 16-Feb-2024
https://doi.org/10.1080/0144929X.2024.2316287
Zoratto VGodoy DAranda G(2023)A Study on Influential Features for Predicting Best Answers in Community Question-Answering ForumsInformation10.3390/info1409049614:9(496)Online publication date: 7-Sep-2023
https://doi.org/10.3390/info14090496
Show More Cited By

Index Terms

A framework to predict the quality of answers with non-textual features
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

Document Expansion Using External Collections
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Document expansion has been shown to improve the effectiveness of information retrieval systems by augmenting documents' term probability estimates with those of similar documents, producing higher quality document representations. We propose a method ...
Quality-biased ranking of web documents
WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining

Many existing retrieval approaches do not take into account the content quality of the retrieved documents, although link-based measures such as PageRank are commonly used as a form of document prior. In this paper, we present the quality-biased ranking ...
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
Knowledge Engineering and Knowledge Management
Abstract
Automatic estimation of the quality of Web documents is a challenging task, especially because the definition of quality heavily depends on the individuals who define it, on the context where it applies, and on the nature of the tasks at hand. Our ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

August 2006

768 pages

ISBN:1595933697

DOI:10.1145/1148170

General Chair:
Efthimis N. Efthimiadis
University of Washington
,
Program Chairs:
Susan Dumais
Microsoft Research, Redmond
,
David Hawking
CSIRO ICT Centre, Canberra, Australia
,
Kalervo Järvelin,
University of Tampere, Finland

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGIR06

Sponsor:

SIGIR06: The 29th Annual International SIGIR Conference

August 6 - 11, 2006

Washington, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

229
Total Citations
View Citations
1,726
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sha AShi YHaller A(2024)Deciphering digital popularity from question quality: the preeminence of social attributes over textual dimensions in driving CQA platform trafficJournal of Information and Telecommunication10.1080/24751839.2024.2400439(1-24)Online publication date: 11-Sep-2024
https://doi.org/10.1080/24751839.2024.2400439
Shi YHaller AReeson ALi XLi C(2024)Investigating the effects of nudges to promote knowledge-sharing behaviours on MOOC forums: a mixed method designBehaviour & Information Technology10.1080/0144929X.2024.2316287(1-26)Online publication date: 16-Feb-2024
https://doi.org/10.1080/0144929X.2024.2316287
Zoratto VGodoy DAranda G(2023)A Study on Influential Features for Predicting Best Answers in Community Question-Answering ForumsInformation10.3390/info1409049614:9(496)Online publication date: 7-Sep-2023
https://doi.org/10.3390/info14090496
Krishna VAntulov-Fantulin N(2023)Temporal-Weighted Bipartite Graph Model for Sparse Expert Recommendation in Community Question AnsweringProceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization10.1145/3565472.3592957(156-163)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3565472.3592957
Yang YTan YYang YHuang Z(2023)Automatic Quality Evaluation for User Generated Contents in Online Q&A Community Based on Word2Vec-CNN2023 International Conference on Neuromorphic Computing (ICNC)10.1109/ICNC59488.2023.10462736(360-366)Online publication date: 15-Dec-2023
https://doi.org/10.1109/ICNC59488.2023.10462736
Park GKim D(2023)CredibleExpertRank: Leveraging Social Network Analysis and Opinion Mining to Facilitate Reliable Information Retrieval on Knowledge-Sharing SitesIEEE Access10.1109/ACCESS.2023.328141211(54724-54749)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3281412
Costa GOrtale R(2023) Ask and Ye shall be AnsweredInformation Fusion10.1016/j.inffus.2023.10185699:COnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.inffus.2023.101856
Rietz TMaedche A(2023)Ladderbot—A conversational agent for human-like online laddering interviewsInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2022.102969171(102969)Online publication date: Jan-2023
https://doi.org/10.1016/j.ijhcs.2022.102969
Miao QHu CXu F(2023)Using Reviewer Information to Improve Performance of Low-Quality Review DetectionComputational Linguistics and Intelligent Text Processing10.1007/978-3-031-23804-8_30(381-399)Online publication date: 26-Feb-2023
https://doi.org/10.1007/978-3-031-23804-8_30
Zhao TLin JZhang Z(2022)The Influence of Multi-Variation In-Trust Web Feature Behavior Performance on the Information Dissemination Mechanism in Virtual CommunitySustainability10.3390/su1410612214:10(6122)Online publication date: 18-May-2022
https://doi.org/10.3390/su14106122
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents