Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1031171.1031283acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Unsupervised question answering data acquisition from local corpora

Published: 13 November 2004 Publication History

Abstract

Data-driven approaches in question answering (QA) are increasingly common. Since availability of training data for such approaches is very limited, we propose an unsupervised algorithm that generates high quality question-answer pairs from local corpora. The algorithm is ontology independent, requiring very small seed data as its starting point. Two alternating views of the data make learning possible: 1) question types are viewed as relations between entities and 2) question types are described by their corresponding question-answer pairs. These two aspects of the data allow us to construct an unsupervised algorithm that acquires high precision question-answer pairs. We show the quality of the acquired data for different question types and perform a task-based evaluation. With each iteration, pairs acquired by the unsupervised algorithm are used as training data to a simple QA system. Performance increases with the number of question-answer pairs acquired confirming the robustness of the unsupervised algorithm. We introduce the notion of <i>semantic drift</i> and show that it is a desirable quality in training data for question answering systems.

References

[1]
C. Clarke, G. Cormack, G. Kemkes, M. Laszlo, T. Lynam, E. Terra, and P. Tilker. Statistical selection of exact answers. Text Retrieval Conference (TREC), 2003.
[2]
C. Clarke, G. Cormack, and T. Lynam. Exploiting redundancy in question answering. International ACM Conference on Research and Development in Information Retrieval (SIGIR), 2001.
[3]
M. Collins and Y. Singer. Unsupervised models for named entity classification. Conference on Empirical Methods in Natural Language Processing (EMNLP)/VLC, 1999.
[4]
S. Dumais, M. Banko, E. Brill, J. Lin, and A. Ng. Web question answering: Is more always better? International ACM Conference on Research and Development in Information Retrieval (SIGIR), 2002.
[5]
A. Echihabi and D. Marcu. A noisy channel approach to question answering. Association for Computational Linguistics Conference (ACL), 2003.
[6]
M. Fleischman, E. Hovy, and A. Echihabi. Offline strategies for online question answering: Answering questions before they are asked. Association for Computational Linguistics Conference (ACL), 2003.
[7]
R. Girju, D. Moldovan, and A. Badulescu. Learning semantic constraints for the automatic discovery of part-whole relations. Human Language Technology and North American chapter of the Association for Computational Linguistics joint conference (HLT-NAACL), 2003.
[8]
U. Hermjakob, E. Hovy, and C. Lin. Knowledge-based question answering. Text Retrieval Conference (TREC), 2000.
[9]
E. Hovy, L. Gerber, U. Hermjakob, M. Junk, and C. Lin. Question answering in webclopedia. Text Retrieval Conference (TREC), 2000.
[10]
E. Hovy, U. Hermjakob, C. Lin, and D. Ravichandran. Using knowledge to facilitate factoid answer pinpointing. International Conference on Computational Linguistics (COLING), 2002.
[11]
Lita and J. Carbonell. Instance-based question answering: A data-driven approach. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2004.
[12]
Lita, W. Hunt, and E. Nyberg. Resource analysis for question answering. Association for Computational Linguistics Conference (ACL), 2004.
[13]
B. Magnini, S. Romagnoli, A. Vallin, J. Herrera, A. Penas, V. Peiado, F. Verdejo, and M. de Rijke. The multiple language question answering track at cross-lingual evaluation forum (clef) 2003. Cross-Lingual Evaluation Forum (CLEF), 2003.
[14]
G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Five papers on wordnet. International Journal of Lexicography, 1990.
[15]
D. Moldovan, D. Clark, S. Harabagiu, and S. Maiorano. Cogex: A logic prover for question answering. Association for Computational Linguistics Conference (ACL), 2003.
[16]
D. Moldovan, S. Harabagiu, M. Pasca, R. Mihalcea, R. Girju, R. Goodrum, and V. Rus. The structure and performance of an open-domain question answering system. "Association for Computational Linguistics Conference (ACL), 2000.
[17]
E. Nyberg, T. Mitamura, J. Callan, J. Carbonell, R. Frederking, K. Collins-Thompson, L. Hiyakumoto, Y. Huang, C. Huttenhower, S. Judy, J. Ko, A. Kupsc, L. V. Lita, V. Pedro, D. Svoboda, and B. V. Durme. The javelin question-answering system at trec 2003: A multi strategy approach with dynamic planning. Text Retrieval Conference (TREC), 2003.
[18]
D. Ravichandran and E. Hovy. Learning surface text patterns for a question answering system. Association for Computational Linguistics Conference (ACL), 2002.
[19]
D. Ravichandran, A. Ittycheriah, and S. Roukos. Automatic derivation of surface text patterns for a maximum entropy based question answering system. Human Language Technology and North American chapter of the Association for Computational Linguistics joint conference (HLT-NAACL), 2003.
[20]
M. Thelen and E. Riloff. A bootstrapping method for learning semantic lexicons using extraction pattern contexts. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002.
[21]
E. Voorhees. Overview of the text retrieval conference (trec) 2003 question answering track. Text Retrieval Conference (TREC), 2003.
[22]
D. Yarowsky. Decision lists for lexical ambiguity resolution: Application to accent restoration in spanish and french. Association for Computational Linguistics Conference (ACL), 1994.

Cited By

View all
  • (2018)Machine learning for query formulation in question answeringNatural Language Engineering10.1017/S135132491000027617:4(425-454)Online publication date: 21-Dec-2018
  • (2018)Intelligent answering location questions from the web using molecular alignmentJournal of Intelligent Information Systems10.1007/s10844-009-0089-435:1(75-90)Online publication date: 28-Dec-2018
  • (2012)Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai TextsIEICE Transactions on Information and Systems10.1587/transinf.E95.D.1932E95.D:7(1932-1946)Online publication date: 2012
  • Show More Cited By

Index Terms

  1. Unsupervised question answering data acquisition from local corpora

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management
      November 2004
      678 pages
      ISBN:1581138741
      DOI:10.1145/1031171
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 November 2004

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. data acquisition
      2. question answering
      3. semantic drift
      4. unsupervised learning

      Qualifiers

      • Article

      Conference

      CIKM04
      Sponsor:
      CIKM04: Conference on Information and Knowledge Management
      November 8 - 13, 2004
      D.C., Washington, USA

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 25 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Machine learning for query formulation in question answeringNatural Language Engineering10.1017/S135132491000027617:4(425-454)Online publication date: 21-Dec-2018
      • (2018)Intelligent answering location questions from the web using molecular alignmentJournal of Intelligent Information Systems10.1007/s10844-009-0089-435:1(75-90)Online publication date: 28-Dec-2018
      • (2012)Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai TextsIEICE Transactions on Information and Systems10.1587/transinf.E95.D.1932E95.D:7(1932-1946)Online publication date: 2012
      • (2011)Relation Extraction for Open and Closed Domain Question AnsweringInteractive Multi-modal Question-Answering10.1007/978-3-642-17525-1_8(171-197)Online publication date: 8-Apr-2011
      • (2007)Model tree learning for query term weighting in question answeringProceedings of the 29th European conference on IR research10.5555/1763653.1763724(589-596)Online publication date: 2-Apr-2007
      • (2007)Mining web snippets to answer list questionsProceedings of the 2nd international workshop on Integrating artificial intelligence and data mining - Volume 8410.5555/1386993.1387000(61-71)Online publication date: 1-Dec-2007
      • (2007)Model Tree Learning for Query Term Weighting in Question AnsweringAdvances in Information Retrieval10.1007/978-3-540-71496-5_55(589-596)Online publication date: 2007
      • (2006)Molecular sequence alignment for extracting answers for where-typed questions from google snippetsProceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I10.1007/11892960_143(1190-1197)Online publication date: 9-Oct-2006

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media