Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1458082.1458281acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Corpus microsurgery: criteria optimization for medical cross-language ir

Published: 26 October 2008 Publication History

Abstract

Automatic subset selection from a parallel corpus significantly cross-lingual information retrieval (CLIR) performance, in addition to increasing its efficiency. Our selection method extracts relevant training data by incorporating additional criteria (i.e. estimated corpus quality, taxonomy projection and size) in addition to lexical-based criteria. The challenge lies in combining these criteria using a meaningful scoring function that can be used for ranking parallel sentence candidates. We choose weighted geometric mean for its soft-AND properties, and we optimize criteria weights by wrapping the CLIR task in an optimization shell. Due to the indeterminate nature of the search space convexity properties, we have explored continuous reactive tabu search (CRTS), a global optimization method. We use a large parallel corpus in the medical domain to examine the effect of adaptation criteria and their combination on CLIR performance. In our experiments, 100 selected sentences yield 90% of the performance obtained with 5,000 times more in-domain parallel sentences. Our optimized criteria weights considerably outperform the uniform distribution baseline, as well as lexical
similarity adaptation.

References

[1]
R. Battiti and G. Tecchiolli. The continuous reactive tabu search: blending combinatorial optimization and stochastic search for global optimization, 1995.
[2]
M. Rogati. Domain Adaptation of Translation Models for Multilingual Applications. PhD Thesis, unpublished http://www.cs.cmu.edu/~mrogati/thesis.pdf, 2008.
[3]
M. Rogati and Y. Yang. Resource selection for domain-specific cross-lingual IR. In ACM SIGIR Special Interest Group on Information Retrieval Conference (SIGIR 2004), 2004.

Index Terms

  1. Corpus microsurgery: criteria optimization for medical cross-language ir

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
      October 2008
      1562 pages
      ISBN:9781595939913
      DOI:10.1145/1458082
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 October 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cross-language information retrieval
      2. domain-specific translation

      Qualifiers

      • Poster

      Conference

      CIKM08
      CIKM08: Conference on Information and Knowledge Management
      October 26 - 30, 2008
      California, Napa Valley, USA

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 158
        Total Downloads
      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media