Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1148170.1148212acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

A framework to predict the quality of answers with non-textual features

Published: 06 August 2006 Publication History

Abstract

New types of document collections are being developed by various web services. The service providers keep track of non-textual features such as click counts. In this paper, we present a framework to use non-textual features to predict the quality of documents. We also show our quality measure can be successfully incorporated into the language modeling-based retrieval model. We test our approach on a collection of question and answer pairs gathered from a community based question answering service where people ask and answer questions. Experimental results using our quality measure show a significant improvement over our baseline.

References

[1]
A. Berger, S. D. Pietra, and V. D. Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--71, 1996.
[2]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998.
[3]
R. D. Burke, K. J. Hammond, V. A. Kulyukin, S. L. Lytinen, N. Tomuro, and S. Schoenberg. Question answering from frequently asked question files: Experiences with the faq finder system. AI Magazine, 18(2):57--66, 1997.
[4]
D. Harman. Overview of the first text retrieval conference (trec-1). In Proceedings of the First TREC Conference, pages 1--20, 1992.
[5]
J. Hwang, S. Lay, and A. Lippman. Nonparametric multivariate density estimation: A comparative study. IEEE Transactions of Signal Processing, 42(10):2795--2810, 1994.
[6]
J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In Proceedings of the ACM Fourteenth Conference on Information and Knowledge Management, pages 76--83, 2005.
[7]
J. Jeon and R. Manmatha. Using maximum entropy for automatic image annotation. Image and Video Retrieval Third International Conference, CIVR 2004, Proceedings Series: Lecture Notes in Computer Science, 3115:24--32, 2004.
[8]
V. Jijkoun and M. de Rijke. Retrieving answers from frequently asked questions pages on the web. In Proceedings of the ACM Fourteenth Conference on Information and Knowledge Management, pages 76--83, 2005.
[9]
H. Kim and J. Seo. High-performance faq retrieval using an automatic clustering method of query logs. Information Processing and Management, 42(3):650--661, 2006.
[10]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.
[11]
W. Kraaij, T. Westerveld, and D. Hiemstra. The importance of prior probabilities for entry page search. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 27--34, 2002.
[12]
L. S. Larkey. Automatic essay grading using text categorization techniques. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 90--95, 1998.
[13]
M. Lenz, A. Hubner, and M. Kunze. Question answering with textual cbr. In Proceedings of the Third International Conference on Flexible Query Answering Systems, pages 236--247, 1998.
[14]
X. Li and W. B. Croft. Time-based language models. In Proceedings of the Twelfth ACM International Conference on Information and knowledge management, pages 469--475, 2003.
[15]
R. Malouf. A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of Conference on Computational Natural Language Learning, pages 49--55, 2002.
[16]
K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In Proceedings of IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61--67, 1999.
[17]
B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002.
[18]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275--281, 1998.
[19]
E. Sneiders. Automated faq answering: Continued experience with shallow language understanding. In Proceedings for the 1999 AAAI Fall Symposium on Question Answering Systems, 1999.
[20]
D. M. Strong, Y. W. Lee, and R. Y. Wang. Data quality in context. Communications of the ACM, 40(5):103--110, 1997.
[21]
C.-H. Wu, J.-F. Yeh, and M.-J. Chen. Domain-specific faq retrieval using independent aspects. ACM Transactions on Asian Language Information Processing, 4(1):1--17, 2005.
[22]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 334--342, 2001.
[23]
Y. Zhou and W. B. Croft. Document quality models for web ad hoc retrieval. In Proceedings of the ACM Fourteenth Conference on Information and Knowledge Management, pages 331--332, 2005.
[24]
X. Zhu and S. Gauch. Incorporating quality metrics in centralized/distributed information retrieval on the world wide web. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 288--295, 2000.

Cited By

View all
  • (2024)Deciphering digital popularity from question quality: the preeminence of social attributes over textual dimensions in driving CQA platform trafficJournal of Information and Telecommunication10.1080/24751839.2024.2400439(1-24)Online publication date: 11-Sep-2024
  • (2024)Investigating the effects of nudges to promote knowledge-sharing behaviours on MOOC forums: a mixed method designBehaviour & Information Technology10.1080/0144929X.2024.2316287(1-26)Online publication date: 16-Feb-2024
  • (2023)A Study on Influential Features for Predicting Best Answers in Community Question-Answering ForumsInformation10.3390/info1409049614:9(496)Online publication date: 7-Sep-2023
  • Show More Cited By

Index Terms

  1. A framework to predict the quality of answers with non-textual features

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
      August 2006
      768 pages
      ISBN:1595933697
      DOI:10.1145/1148170
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 August 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. document quality
      2. information retrieval
      3. language models
      4. maximum entropy

      Qualifiers

      • Article

      Conference

      SIGIR06
      Sponsor:
      SIGIR06: The 29th Annual International SIGIR Conference
      August 6 - 11, 2006
      Washington, Seattle, USA

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)29
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 03 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Deciphering digital popularity from question quality: the preeminence of social attributes over textual dimensions in driving CQA platform trafficJournal of Information and Telecommunication10.1080/24751839.2024.2400439(1-24)Online publication date: 11-Sep-2024
      • (2024)Investigating the effects of nudges to promote knowledge-sharing behaviours on MOOC forums: a mixed method designBehaviour & Information Technology10.1080/0144929X.2024.2316287(1-26)Online publication date: 16-Feb-2024
      • (2023)A Study on Influential Features for Predicting Best Answers in Community Question-Answering ForumsInformation10.3390/info1409049614:9(496)Online publication date: 7-Sep-2023
      • (2023)Temporal-Weighted Bipartite Graph Model for Sparse Expert Recommendation in Community Question AnsweringProceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization10.1145/3565472.3592957(156-163)Online publication date: 18-Jun-2023
      • (2023)Automatic Quality Evaluation for User Generated Contents in Online Q&A Community Based on Word2Vec-CNN2023 International Conference on Neuromorphic Computing (ICNC)10.1109/ICNC59488.2023.10462736(360-366)Online publication date: 15-Dec-2023
      • (2023)CredibleExpertRank: Leveraging Social Network Analysis and Opinion Mining to Facilitate Reliable Information Retrieval on Knowledge-Sharing SitesIEEE Access10.1109/ACCESS.2023.328141211(54724-54749)Online publication date: 2023
      • (2023) Ask and Ye shall be AnsweredInformation Fusion10.1016/j.inffus.2023.10185699:COnline publication date: 1-Nov-2023
      • (2023)Ladderbot—A conversational agent for human-like online laddering interviewsInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2022.102969171(102969)Online publication date: Jan-2023
      • (2023)Using Reviewer Information to Improve Performance of Low-Quality Review DetectionComputational Linguistics and Intelligent Text Processing10.1007/978-3-031-23804-8_30(381-399)Online publication date: 26-Feb-2023
      • (2022)The Influence of Multi-Variation In-Trust Web Feature Behavior Performance on the Information Dissemination Mechanism in Virtual CommunitySustainability10.3390/su1410612214:10(6122)Online publication date: 18-May-2022
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media