Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1835449.1835515acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Segmentation of multi-sentence questions: towards effective question retrieval in cQA services

Published: 19 July 2010 Publication History

Abstract

Existing question retrieval models work relatively well in finding similar questions in community-based question answering (cQA) services. However, they are designed for single-sentence queries or bag-of-word representations, and are not sufficient to handle multi-sentence questions complemented with various contexts. Segmenting questions into parts that are topically related could assist the retrieval system to not only better understand the user's different information needs but also fetch the most appropriate fragments of questions and answers in cQA archive that are relevant to user's query. In this paper, we propose a graph based approach to segmenting multi-sentence questions. The results from user studies show that our segmentation model outperforms traditional systems in question segmentation by over 30% in user's satisfaction. We incorporate the segmentation model into existing cQA question retrieval framework for more targeted question matching, and the empirical evaluation results demonstrate that the segmentation boosts the question retrieval performance by up to 12.93% in Mean Average Precision and 11.72% in Top One Precision. Our model comes with a comprehensive question detector equipped with both lexical and syntactic features.

References

[1]
Trec proceedings. http://trec.nist.gov/.
[2]
E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In WSDM, 2008.
[3]
F. Y. Y. Choi. Advances in domain independent linear text segmentation. In NAACL, 2000.
[4]
G. Cong, L. Wang, C.-Y. Lin, Y.-I. Song, and Y. Sun. Finding question-answer pairs from online forums. In SIGIR, 2008.
[5]
H. Duan, Y. Cao, C.-Y. Lin, and Y. Yu. Searching questions by identifying question topic and question focus. In HLT-ACL, 2008.
[6]
M. A. Hearst. Multi-paragraph segmentation of expository text. In ACL, 1994.
[7]
J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In CIKM, 2005.
[8]
V. Jijkoun and M. de Rijke. Retrieving answers from frequently asked questions pages on the web. In CIKM, 2005.
[9]
N. Jindal and B. Liu. Identifying comparative sentences in text documents. In SIGIR, 2006.
[10]
M.-Y. Kan, J. L. Klavans, and K. R. McKeown. Linear segmentation and segment significance. In WVLC, 1998.
[11]
J. Pei, J. Han, B. Mortazavi-asl, H. Pinto, Q. Chen, U. Dayal, and M. chun Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In ICDE, 2001.
[12]
V. Prince and A. Labadie. Text segmentation based on document understanding for information retrieval. 2007.
[13]
J. C. Reynar. Topic segmentation: Algorithms and applications, 1998.
[14]
S. Riezler, A. Vasserman, I. Tsochantaridis, V. Mittal, and Y. Liu. Statistical machine translation for query expansion in answer retrieval. In ACL, 2007.
[15]
X. Song, G. Fan, and M. Rao. Svm-based data editing for enhanced one-class classification of remotely sensed imagery. Geoscience and Remote Sensing Letters, IEEE, 2008
[16]
R. Soricut and E. Brill. Automatic question answering: Beyond the factoid. In HLT-NAACL, 2004.
[17]
G. Sun, G. Cong, X. Liu, C.-Y. Lin, and M. Zhou. Mining sequential patterns and tree patterns to detect erroneous sentences. In AAAI, 2007.
[18]
K. Wang, Z. Ming, and T.-S. Chua. A syntactic tree matching approach to finding similar questions in community-based qa services. In SIGIR, 2009.
[19]
X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In SIGIR, 2008.

Cited By

View all

Index Terms

  1. Segmentation of multi-sentence questions: towards effective question retrieval in cQA services

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
      July 2010
      944 pages
      ISBN:9781450301534
      DOI:10.1145/1835449
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 July 2010

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Yahoo! answers
      2. question answering
      3. question matching
      4. question segmentation

      Qualifiers

      • Research-article

      Conference

      SIGIR '10
      Sponsor:

      Acceptance Rates

      SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 25 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)HSM-QA: Question Answering System Based on Hierarchical Semantic MatchingIEEE Access10.1109/ACCESS.2023.329685011(77826-77839)Online publication date: 2023
      • (2023)A Chinese Dialogue Corpus Annotated with Dialogue ActChinese Language Resources10.1007/978-3-031-38913-9_14(227-245)Online publication date: 19-Dec-2023
      • (2022)PlanHelperProceedings of the ACM on Human-Computer Interaction10.1145/35555556:CSCW2(1-26)Online publication date: 11-Nov-2022
      • (2019)Relevant Subsection Retrieval for Law Domain Question Answer SystemData Visualization and Knowledge Engineering10.1007/978-3-030-25797-2_13(299-319)Online publication date: 10-Aug-2019
      • (2018)Web Forum Retrieval and Text AnalyticsFoundations and Trends in Information Retrieval10.1561/150000006212:1(1-163)Online publication date: 3-Jan-2018
      • (2017)Reflexive hybrid approach to provide precise answer of user desired frequently asked question2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence10.1109/CONFLUENCE.2017.7943142(159-163)Online publication date: Jan-2017
      • (2016)A Comprehensive Survey and Classification of Approaches for Community Question AnsweringACM Transactions on the Web10.1145/293468710:3(1-63)Online publication date: 16-Aug-2016
      • (2015)Predicting associated statutes for legal problemsInformation Processing & Management10.1016/j.ipm.2014.07.00351:1(194-211)Online publication date: Jan-2015
      • (2015)Research on the Extraction of Wikipedia-Based Chinese-Khmer Named Entity EquivalentsNatural Language Processing and Chinese Computing10.1007/978-3-319-25207-0_32(372-379)Online publication date: 9-Oct-2015
      • (2015)Mining RDF from Tables in Chinese EncyclopediasNatural Language Processing and Chinese Computing10.1007/978-3-319-25207-0_24(285-298)Online publication date: 9-Oct-2015
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media