Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1099554.1099572acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Finding similar questions in large question and answer archives

Published: 31 October 2005 Publication History

Abstract

There has recently been a significant increase in the number of community-based question and answer services on the Web where people answer other peoples' questions. These services rapidly build up large archives of questions and answers, and these archives are a valuable linguistic resource. One of the major tasks in a question and answer service is to find questions in the archive that a semantically similar to a user's question. This enables high quality answers from the archive to be retrieved and removes the time lag associated with a community-based system. In this paper, we discuss methods for question retrieval that are based on using the similarity between answers in the archive to estimate probabilities for a translation-based retrieval model. We show that with this model it is possible to find semantically similar questions with relatively little word overlap.

References

[1]
D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 407--416, 2000.
[2]
A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal. Bridging the lexical chasm: statistical approaches to answer-finding. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 192--199, 2000.
[3]
A. Berger and J. Lafferty. Information retrieval as statistical translation. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 222--229, 1999.
[4]
P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 19(2):263--311, 1993.
[5]
R. D. Burke, K. J. Hammond, V. A. Kulyukin, S. L. Lytinen, N. Tomuro, and S. Schoenberg. Question answering from frequently asked question files: Experiences with the faq finder system. Technical report, 1997.
[6]
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
[7]
C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.
[8]
T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI'99, pages 289--296, 1999.
[9]
J. Jeon, W. B. Croft, and J. H. Lee. Finding semantically similar questions based on their answers. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 617--618, 2005.
[10]
Y.-S. Lai, K.-A. Fung, and C.-H. Wu. Faq mining via list detection. In Proceedings of the Workshop on Multilingual Summarization and Question Answering, 2002.
[11]
V. Lavrenko, M. Choquette, and W. B. Croft. Cross-lingual relevance models. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 175--182, 2002.
[12]
V. Lavrenko and W. B. Croft. Relevance based language models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 120--127, 2001.
[13]
C. Manning and H. Schutze. Foundation of statistical natural language processing. The MIT Press, 1999.
[14]
D. Metzler and W. B. Croft. Analysis of statistical question classification for fact-based questions. Information Retrieval, 8(3):481--504, 2005.
[15]
V. Murdock and W. B. Croft. Simple translation models for passage retrieval in factoid question answering. In Proceedings of the Workshop on Information Retrieval for Question Answering, 2004.
[16]
M. A. Pasca and S. M. Harabagiu. High performance question/answering. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 366--374, 2001.
[17]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275--281, 1998.
[18]
J. J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Indexing, pages 324--336. Prentice Hall, 1971.
[19]
E. Sneiders. Automated question answering using question templates that cover the conceptual model of the database. In Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers, pages 235--239, 2002.
[20]
A. Tombros, R. Villa, and C. J. V. Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Inf. Process. Manage., 38(4):559--582, 2002.
[21]
E. M. Voorhees. Query expansion using lexical-semantic relations. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 61--69, 1994.
[22]
E. M. Voorhees. Overview of the TREC 2004 question answering track. In Proceedings of the Thirteenth Text Retrieval Conference, 2004.
[23]
J. R. Wen, J. Y. Nie, and H. Zhang. Query clustering using user logs. ACM Trans. Inf. Syst., 20(1):59--81, 2002.
[24]
J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4--11, 1996.

Cited By

View all
  • (2024)Harnessing the Power of Metadata for Enhanced Question Retrieval in Community Question AnsweringIEEE Access10.1109/ACCESS.2024.339544912(65768-65779)Online publication date: 2024
  • (2024)Relational concept enhanced prototypical network for incremental few-shot relation classificationKnowledge-Based Systems10.1016/j.knosys.2023.111282284:COnline publication date: 25-Jan-2024
  • (2024)CS Net: A Coarse-to-Fine-Grained Summarization Network for Community-Based Question Answering SummarizationKnowledge Science, Engineering and Management10.1007/978-981-97-5495-3_31(407-423)Online publication date: 26-Jul-2024
  • Show More Cited By

Index Terms

  1. Finding similar questions in large question and answer archives

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management
      October 2005
      854 pages
      ISBN:1595931406
      DOI:10.1145/1099554
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 31 October 2005

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. FAQ retrieval
      2. information retrieval
      3. language models

      Qualifiers

      • Article

      Conference

      CIKM05
      Sponsor:
      CIKM05: Conference on Information and Knowledge Management
      October 31 - November 5, 2005
      Bremen, Germany

      Acceptance Rates

      CIKM '05 Paper Acceptance Rate 77 of 425 submissions, 18%;
      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)41
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 15 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Harnessing the Power of Metadata for Enhanced Question Retrieval in Community Question AnsweringIEEE Access10.1109/ACCESS.2024.339544912(65768-65779)Online publication date: 2024
      • (2024)Relational concept enhanced prototypical network for incremental few-shot relation classificationKnowledge-Based Systems10.1016/j.knosys.2023.111282284:COnline publication date: 25-Jan-2024
      • (2024)CS Net: A Coarse-to-Fine-Grained Summarization Network for Community-Based Question Answering SummarizationKnowledge Science, Engineering and Management10.1007/978-981-97-5495-3_31(407-423)Online publication date: 26-Jul-2024
      • (2024)Question AnsweringNatural Language Processing in Biomedicine10.1007/978-3-031-55865-8_9(231-263)Online publication date: 9-Jun-2024
      • (2023)A Principled Decomposition of Pointwise Mutual Information for Intention Template DiscoveryProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614767(1746-1755)Online publication date: 21-Oct-2023
      • (2023)How do we elicit more user feedback in the social Q&A community? A consideration of the expertise-required questionInformation Technology & People10.1108/ITP-10-2022-075137:4(1587-1612)Online publication date: 22-May-2023
      • (2023)Preciser comparison: Augmented multi-layer dynamic contrastive strategy for text2text question classificationNeurocomputing10.1016/j.neucom.2023.126299544(126299)Online publication date: Aug-2023
      • (2023)Similar question retrieval with incorporation of multi-dimensional quality analysis for community question answeringNeural Computing and Applications10.1007/s00521-023-09266-636:7(3663-3679)Online publication date: 6-Dec-2023
      • (2023)Category-Highlighting Transformer Network for Question RetrievalDatabase Systems for Advanced Applications10.1007/978-3-031-30675-4_33(457-467)Online publication date: 15-Apr-2023
      • (2022)Adversarial Cross-domain Community Question RetrievalACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348729121:3(1-22)Online publication date: 10-Jan-2022
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media