Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1835449.1835558acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

To translate or not to translate?

Published: 19 July 2010 Publication History

Abstract

Query translation is an important task in cross-language information retrieval (CLIR) aiming to translate queries into languages used in documents. The purpose of this paper is to investigate the necessity of translating query terms, which might differ from one term to another. Some untranslated terms cause irreparable performance drop while others do not. We propose an approach to estimate the translation probability of a query term, which helps decide if it should be translated or not. The approach learns regression and classification models based on a rich set of linguistic and statistical properties of the term. Experiments on NTCIR-4 and NTCIR-5 English-Chinese CLIR tasks demonstrate that the proposed approach can significantly improve CLIR performance. An in-depth analysis is also provided for discussing the impact of untranslated out-of-vocabulary (OOV) query terms and translation quality of non-OOV query terms on CLIR performance.

References

[1]
J. Allan, J. Callan, W. B. Croft, L. Ballesteros, J. Broglio, J. Xu, and H. Shu. Inquery at trec-5. In Proc. of the Fifth Text Retrieval Conference TREC-5, pages 119--132, 1997.
[2]
L. Ballesteros and W. B. Croft. Dictionary methods for cross-lingual information retrieval. In Database and Expert Systems Applications, pages 791--801, 1996.
[3]
L. Ballesteros and W. B. Croft. Resolving ambiguity for cross-language retrieval. In Proc. of ACM-SIGIR '98, pages 64--71, 1998.
[4]
M. Bendersky and W. B. Croft. Discovering key concepts in verbose queries. In Proc. of ACM-SIGIR '08, 2008.
[5]
J. Carbonell, Y. Yang, R. Frederking, R. Brown, Y. Geng, and D. Lee. Translingual information retrieval: A comparative evaluation. In Proc. of IJCAI, pages 708--715, 1997.
[6]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[7]
P.-J. Cheng, J.-W. Teng, R.-C. Chen, J.-H. Wang, W.-H. Lu, and L.-F. Chien. Translating unknown queries with web corpora for cross-language information retrieval. In Proc. of ACM-SIGIR '04, pages 146--153, 2004.
[8]
M. Federico and N. Bertoldi. Statistical cross-language information retrieval using n-best query translations. In Proc. of ACM-SIGIR '02, pages 167--174, 2002.
[9]
J. Gao, J.-Y. Nie, E. Xun, J. Zhang, M. Zhou, and C. Huang. Improving query translation for cross language information retrieval using statistical models. In Proc. of ACM-SIGIR '01, pages 96--104, 2001.
[10]
K. Kishida. Prediction of performance of cross-language information retrieval using automatic evaluation of translation. Library & Information Science Research, 30(2):138--144, 2008.
[11]
J. Kupiec. An algorithm for finding noun phrase correspondences in bilingual corpora. In Proc. of ACL, pages 17--22. Association for Computational Linguistics, 1993.
[12]
P. McNamee and J. Mayfield. Comparing cross-language query expansion techniques by degrading translation resources. In Proc. of ACM-SIGIR '02, pages 159--166, 2002.
[13]
J.-Y. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In Proc. of ACM-SIGIR '99, pages 74--81, 1999.
[14]
D. Oard. A comparative study of query and document translation for cross language information retrieval. Machine Translation and the Information Soup, pages 472--483, 1998.
[15]
D. Oard and A. Diekema. Cross-language information retrieval. Anne Diekema, page 5, 1998.
[16]
A. Pirkola. The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In Proc. of ACM-SIGIR '98, pages 55--63, 1998.
[17]
F. Smadja, K. McKeown, and V. Hatzivassiloglou. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1):1--38, 1996.
[18]
J. Zhu and H. Wang. The effect of translation quality in mt-based cross-language information retrieval. In Proc. of ACL, pages 593--600. Association for Computational Linguistics, 2006.

Cited By

View all

Index Terms

  1. To translate or not to translate?

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
    July 2010
    944 pages
    ISBN:9781450301534
    DOI:10.1145/1835449
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross-language information retrieval
    2. query term performance
    3. query translation
    4. translation quality

    Qualifiers

    • Research-article

    Conference

    SIGIR '10
    Sponsor:

    Acceptance Rates

    SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Multi-lingual geoparsing based on machine translationFuture Generation Computer Systems10.1016/j.future.2017.07.05796:C(667-677)Online publication date: 1-Jul-2019
    • (2016)Arabic Cross-Language Information RetrievalACM Transactions on Asian and Low-Resource Language Information Processing10.1145/278921015:3(1-44)Online publication date: 28-Jan-2016
    • (2014)Information RetrievalNatural Language Processing of Semitic Languages10.1007/978-3-642-45358-8_10(299-334)Online publication date: 25-Mar-2014
    • (2012)Translation techniques in cross-language information retrievalACM Computing Surveys10.1145/2379776.237977745:1(1-44)Online publication date: 7-Dec-2012
    • (2011)Is a query worth translatingProceedings of the 33rd European conference on Advances in information retrieval10.5555/1996889.1996920(238-250)Online publication date: 18-Apr-2011
    • (2011)Is a Query Worth TranslatingProceedings of the 33rd European Conference on Advances in Information Retrieval - Volume 661110.1007/978-3-642-20161-5_24(238-250)Online publication date: 18-Apr-2011

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media